



















MATHEMATICS 
OF STATISTICS 

PART TWO ~ 


BY 

JOHN F. KENNEY 

Northwestern University 



LONDON 

CHAPMAN & HALL LTD. 

11 HENRIETTA STREET, W.C.2 
1940 





Copyright, 1939, by 

D. VAN NOSTRAND COMPANY, INC. 


All Rights Reserved 


PRINTED IN THE UNITED STATES OF AMERICA 



Matri Meae 
in Signum 
Cratitudinis 



PREFACE 


There are two quite distinct aspects or levels of mathematical 
statistics. The one involves elementary mathematics and the 
methodologies serve descriptive purposes. These fundamentals are 
set forth in Part I. The other aspect is essentially mathematical in 
character and the methodologies are developed for inferential pur¬ 
poses. It cannot be made elementary by its very nature because 
the problems are so difficult that powerful mathematical tools are 
necessary to provide solutions of the problems. 

In recent years great advances have been made in statistical 
theory. Methods of formulating and testing hypotheses have been 
systematically developed and a sound basis for statistical inference 
has replaced older methods involving the intuitive notions of “ prob¬ 
able error.” In this book I have elected to include some of the 
classical theory and some of the simpler concepts and techniques of 
the modern theory. In short, I have made a sustained effort to write 
an up-to-date text which will serve to prepare the student for the 
really mathematical part of the theory of statistics. A knowledge 
of elementary probability theory, calculus, and determinants is pre¬ 
supposed. It is also understood that the student is familiar with the 
rudiments of statistics such as are given in Part I. However, if no 
preliminary course in statistics has been studied, mature students 
should be able to acquire the essential definitions and concepts in a 
rapid survey of Part I. 

Of the books which have been particularly useful in preparing the 
manuscript, I would name the following: Camp’s Mathematical 
Statistics, Fisher’s Statistical Methods For Research Workers, Fry’s 
Probability And Its Engineering Uses, Rietz’s Mathematical Statistics, 
and Wilks’ Statistical Inference. I have also derived much help from 
certain papers in the literature by Professors Carver, Jackson, Rider, 
and Rietz. Specific reference to these papers is made in the text. 
A reference list of pertinent books and papers is given at the end of 
each of the last three chapters. It is recommended that some of 
these be available to the student for supplementary study in connec¬ 
tion with this text. 



vi 


Preface 


I am indebted in a very special way to Professor Allen T. Craig: 
for material taken in his lectures, for a keenly critical reading of the 
manuscript, and for assistance with the proof reading. Without his 
help and encouragement Part II could hardly have been written. 

Finally, I wish to express my appreciation to D. Van Nostrand 
Company, Inc., for their efficient and cheerful cooperation in the 
manufacture of both parts of the book. 

John F. Kenney 

Evanston, Illinois. 

April, 1939 





CONTENTS 

CHAPTER I 

PROBABILITY AND ITS RELATION TO STATISTICAL THEORY. THE 
BERNOULLI DISTRIBUTION. APPROXIMATIONS BY MEANS OF 
THE NORMAL CURVE AND POISSON EXPONENTIAL 

SECTION PAGE 

1. Importance. 1 

2. Definitions. 2 

3. Theorems. 4 

4. Supplementary Reading. 5 

5. Repeated Trials. 7 

6. Relative Frequencies From Dichotomous Samples. 9 

7. Theorem of Bernoulli. 9 

8. Binomial Description of Frequency . 9 

9. Graphical Representation. 10 

10. The Mean and Standard Deviation. 11 

11. Skewness and Kurtosis. 14 

12. A Recursion Formula. 15 

13. Mathematical Expectation. 16 

14. Approximating the Binomial with the Normal Curve. 19 

15. Simple Sampling of Attributes. 24 

16. Probable Error. 26 

17. Standard Error and Correlation of Errors in Class Frequencies. 27 

18. The Poisson Exponential. 29 

CHAPTER II 

SOME USEFUL INTEGRALS AND FUNCTIONS 

1. The Gamma Function. 35 

2. Stirling’s Approximation. 38 

3. The Beta Function. 39 

4. Reduction to Gamma and Beta Functions. 40 

5. Incomplete Beta and Gamma Functions. 41 

CHAPTER III 

GENERAL CONCEPTS OF DISTRIBUTION FUNCTION OF A CON¬ 
TINUOUS VARIABLE. GENERALIZED FREQUENCY CURVE 

1. Fundamental Notions and Definitions. . . 43 

2. Moments. 44 

3. The Pearson System. 46 






























viii Contents 

SECTION PAGE 

4. Genesis of the Pearson Curves in the Theory of Probability. 51 

5. Further Discussion of the Normal Curve. 55 

6. The Gram-Charlier Series. 59 

CHAPTER IV 

JOINT DISTRIBUTIONS OF TWO VARIABLES. THE NORMAL 
CORRELATION SURFACE 

1. Fundamental Notions. 63 

2. Moments. 64 

3. Regression. 65 

4. The Standard Error of Estimate. 69 

5. The Normal Correlation Surface. 70 

6 . Limiting Forms. 73 

7. Tetrachoric Correlation. 74 

CHAPTER V 

MULTIPLE AND PARTIAL CORRELATION 

1. Notation. 7 g 

2. Regression. 82 

3. Standard Error of Estimate. 85 

4. Standard Deviation of Estimated Values. 87 

5. Multiple Correlation Coefficient. 87 

6 . Limiting Cases. 89 

7. Partial Correlation. 91 

8 . An Alternate Derivation. 94 

CHAPTER VI 

FUNDAMENTALS OF SAMPLING THEORY WITH SPECIAL 
REFERENCE TO THE MEAN 

1. Introduction. 97 

2. Method of Attack.1. 98 

3. Expected Values.. 99 

4. Standard Error of a Linear Function of Variables. 101 

5. Theorems.. 102 

6 . An Experiment. . .. 105 

7. Reproductive Property of the Normal Law. 109 

8 . Non-Normal Universes. Ill 

9. Tchebycheff’s Inequality. 113 

10 . Law of Large Numbers. 114 

11. Probability Scale of Sampling Fluctuations. 115 

12 . Null Hypothesis and Significance Tests. 116 

13. Size of Sample to Have a Given Reliability. 118 

14. Difference in Proportions. . .. 119 




































Contents 


IX 


CHAPTER VII 

SMALL OR EXACT SAMPLING THEORY 

SECTION PAGE 

1. Introduction. 123 

2. Expected Value of s 2 .. 123 

3. Unbiased Estimates of Population Parameters. 125 

4. Degrees of Freedom.•. 128 

5. “ Student’s ” Distribution. 128 

6. Fisher’s Derivation. 130 

7. Distribution of x, s 2 , and s, Taken Singly. 133 

8. The (x, s)-Frequency Surface. 136 

9. Fisher’s ^-Distribution. 137 

10. Difference between Two Means. 140 

11. Fisher’s ^-Distribution. 142 

12. Significance of Difference Between Variances. 145 

13. Analysis of Variance. 147 

14. Testing Variation in Sub-sets of Means. 151 

15. Testing Linear Regression. 153 

16. Tests of Significance of r. 155 

CHAPTER VIII 
A. THE x 2 DISTRIBUTION 
B. STATISTICAL INFERENCE 

1. The Multinomial Law. 164 

2. The x 2 Distribution. 166 

3. Tables. 168 

4. Applications. 168 

5. Induction versus Deduction. 172 

6. Bayes’ Theorem. 173 

7. Probable Error. 176 

8. Fiducial Theory. 177 

9. Fiducial Limits. 181 

APPENDIX 

Table I. Ordinates and Areas of the Normal Curve. 191 

Table II. 5% and 1% Points for the Distribution of F. 194 

Table III. Table of x 2 Probability Scale. 198 

Index 


199 
































MATHEMATICS OF STATISTICS 


CHAPTER I 

PROBABILITY AND ITS RELATION TO STATISTICAL THEORY. THE 
BERNOULLI DISTRIBUTION. APPROXIMATIONS BY MEANS OF THE 
NORMAL CURVE AND POISSON EXPONENTIAL FUNCTION 

1. Importance. The subject of probability deals with one of the 
most interesting branches of modem mathematics and is becoming 
conspicuous for its applications in many fields of learning. This sub¬ 
ject is of fundamental importance, not only in the theory of insur¬ 
ance and statistics, but also in various branches of the biological and 
physical sciences. The following quotations from contemporary 
writers indicate the importance of probability theory in the philosophy 
of modem science. 

It was, I think, Huxley who said that six monkeys, set to strum unintelligently 
on typewriters for millions of millions of years, would be bound in time to write 
all the books in the British Museum. If we examined the last page which a 
particular monkey had typed, and found that it had chanced, in its blind strum¬ 
ming, to type a Shakespeare sonnet, we should rightly regard the occurrence as a 
remarkable accident, but if we looked through all the millions of pages the mon¬ 
keys had turned off in untold millions of years, we might be sure of finding a 
Shakespeare sonnet somewhere amongst them, the product of the blind play of 
chance. . .. 

These and other considerations have led many physicists to suppose that there 
is no determinism in events in which atoms and electrons are involved singly, and 
that the apparent determinism in large-scale events is only of a statistical nature. 
When we are dealing with atoms and electrons in crowds, the mathematical law 
of averages imposes the determinism which physical laws fail to provide. . . . 
We can only speak in terms of probabilities. 

— The Mysterious Universe, Sir James Jeans. 

In order to understand the nature of knowledge about social and economic fife, 
it is necessary to know something about the theory of probability; because 
knowledge in these fields, in general, is essentially indeterminate knowledge. 
There are two fundamental ideas which need to be grasped in order to understand 
the social sciences. The first idea is that all science is philosophical. . . . The 
1 



2 


Mathematics of Statistics 


time honored aim of philosophy has been to discover and interpret (to the extent 
possible to the human mind) the characteristics of nature. By nature is meant 
all things, material and psychic, external to man, and man himself. In many 
fields the minds of men have penetrated into the mysteries of nature and have 
produced knowledge concerning them. In the physical aspects (both external 
to man and in man) great progress has been made towards the attainment of 
apparently precise knowledge, within certain definite limits; while in the field 
of the psychic the progress has been towards increasing the probabilities of truth 
of a great variety of hypotheses. But it is characteristic of the psychic aspects 
of knowledge that the facts in those fields are indeterminate, not precise, and 
apparently dynamic. Even in the physical and chemical world, the discoveries 
of recent years have emphasized a great realm of indeterminacy, particularly 
when confronting great velocities and infinitely small particles within the atom. 
Thus the second idea to grasp is that in all fields of knowledge, even the physical, 
beyond the limited range of relatively precise knowledge accumulated by man, 
there is a vast frontier of speculation. It has been the function of scientific 
method — the new tool of philosophy — to penetrate ever deeper into this realm 
of speculative knowledge. Primarily this has been made possible by the develop¬ 
ment of the theory of probabilities. 

— Elementary Statistics, James G. Smith. 

There exist in nature systems of chance causes which operate in such a way 
that the effects of these causes can be predicted — by makin g use of customary 
probability theory in which objective probabilities in the limiting statistical 
sense are substituted for the mathematical probabilities. 

— Economic Control of Quality of Manufactured Product, W. A. Shewhart. 

It appears likely that the further development of the theory of probability in 
the next few decades may turn out to be a major chapter in the history of science. 

— Science, January 18, 1929. 

The great extension in the use of statistics in the last two decades has been 
associated with and largely made possible by mathematical developments based 
upon the theory of probability. 

— Harold Hotelling, Journal American Statistical Association, March 

Supplement, 1931 

2. Definitions. Inasmuch as the subject of probability plays an 
important role in certain phases of statistical theory, we will now 
consider some of the fundamental principles of this subject. It will be 
convenient to divide the subject into two classes, and speak of a 
priori and empirical probability. 

(a) A priori probability. If all the ways of obtaining successes and 
failures can be analyzed into s possible mutually exclusive ways each 
of which is equally likely, and if x of these ways give successes, the 
probability of success in a single trial is x/s. 

A priori probability is concerned with that class of problems in 
which a full knowledge of the conditions affecting the event in ques- 



3 


Probability and Statistical Theory 

tion is known beforehand. In other words, the problem may be set up 
and solved abstractly. Thus the following problems are questions of 
a priori probability: A box contains 4 white and 6 red billiard balls. 
What is the probability that two drawn will be of the same color? 
A coin is to be tossed 7 times. What is the probability that heads 
will turn up at least 3 times? A sample of telephone receivers is 
to be taken from a case containing 100 telephone receivers of which 
20 are known to be defective. What is the probability that the sam¬ 
ple will contain exactly 2 defectives? 

There is another class of events in which it is impossible or im¬ 
practical to enumerate all of the equally likely ways in which the 
event in question may succeed or fail. When this is the case it is 
necessary to estimate the probability by trial and observation. Thus 
we have 

(6) Empirical probability. If it is observed that an event has 
occurred x times among s trials the ratio x/s is called the relative 
frequency of success. The limit* of the ratio x/s as s is taken 
indefinitely large is called the probability of success in a single trial. 
In symbols we have 

lim ~ = p. 

3 —► GO S 

In statistical applications the limit of x/s cannot in general be deter¬ 
mined, but an observed relative frequency (s large) often provides 
a valuable estimate of the underlying probability assumed in the 
definition. For example, according to the American Experience 

*The student familiar with the theory of limits will realize that a rigorous 
proof that a probability p exists as the limit of x/s as s increases would require 
us to show that, Given an c > 0, then there exists a number N such that 

- — p < e for all s 2: N. 
s | 

It is of course obvious that we cannot prove the existence of this limit because 
we cannot be sure that the difference | x/s — p\ will become and remain, as s 
increases, less than any assigned positive number, no matter how small. For 
example, after throwing a coin 10,000 times it is possible to get a run of all heads 
in the next 1000 throws. In this connection Rietz says: “ That the limit exists 
is an empirical assumption whose validity cannot be proved, but experience with 
data in many fields has given much support to the . . . usefulness of the assump¬ 
tion.” ( Mathematical Statistics, p. 8.) 

We can, however, prove that the probability approaches certainty that x/s 
will approach p as a limit as s is indefinitely increased. (See § 7.) 




4 


Mathematics of Statistics 


Mortality Table, out of 57,917 persons living aged 60, there are 1546 
who die during the following year. Therefore, the relative frequency 
1546/57,917 = .026693 is taken by insurance companies as the 
probability that a person aged 60 will not survive another year. 

3. Theorems. We will now review from algebra certain elemen¬ 
tary formulas and theorems leading to the use of probability theory 
in statistical problems. We will begin with the subject of permuta¬ 
tions and combinations. 

A permutation is an arrangement of all or part of a set of things. 
A combination is a group of all or part of a set of things. A different 
permutation may be obtained by changing either the items or their 
order but a different combination may be obtained only by changing 
one or more of the items in the group. 

Theorem I. The number of permutations of n different things taken 
rata time is denoted by the symbol P(n, r) and given by 

P(n, r) = n{n — l)(n — 2) — r + 1). 

Corollary. If the n items are not all different , there being n x of 
type Ti, n 2 of type T 2 , • • •, n k of type T k , then the number of distinct 
permutations of the n items taken n at a time is 

wi ! 

ni ! n% ! • • • n k ! 

k 

where = n. The symbol n !, read “ factorial n,” is defined by 
i 

n ! = n{n — l)(n — 2) • • • 3 • 2 • 1. 

Theorem II. The number of combinations of n different things 
taken rata time is denoted by C (n, r) and given by 

_. n{n — 1) • • • (n — r + 1) n ! 

C ( n f r) — — • 

r ! r ! (n — r) ! 

It will be understood that C(n, r) equals zero when r > n and equals 
one when r = n. 

Theorem HI. The total number of combinations of n different things 
taken 1 , 2, • • •, or n at a time is 2 n — 1 . 

Proof. The formula for C(n, r) is the coefficient of the (r + l)sf 
term in the binomial expansion ( x + y) n . Thus, 


(x + y) n = x n 4- C(n, l)z n-1 y + C(n, 2 )x n ~ 2 y i 
+ • • • + C(n, r)x n-r i/ r + • • • + y n . 




Probability and Statistical Theory 


5 


If we let x = y = 1, this becomes 

2" — 1 = C(n , 1) + C(n , 2) + • • • + C(n, r ) 

+ • • * + C(n, n ). 

Events of a set are said to be mutually exclusive if the occurrence 
of any one of them on a particular occasion excludes the occurrence 
of any other on that occasion. They are said to be independent or 
dependent according as the occurrence of any one of them does not 
or does affect the occurrence of others in the set. 

If p is the probability that an event will happen in a single trial 
and q is the probability that the event will fail (to happen) in a single 
trial, then p + q = 1 and unity is the symbol for certainty. 

Theorem IV. The probability that one or other of a set of mutually 
exclusive events should happen when all of them are in question is the 
sum of the probabilities for the separate events. 

Theorem V. The probability that all of a set of independent events 
will happen on a given occasion when all of them are in question is the 
product of the probabilities for the separate events. 

Theorem VI. Suppose the events are dependent. Let pi be the 
'probability for the happening of a first event Ei and p% be the probability 
for the occurrence of a second event E 2 after E\ has happened. Then the 
probability that both events will happen in the order named is pip*. 
The procedure may be extended in an obvious manner to any finite 
number of events. 

4. Supplementary Reading. It is suggested that the student look 
up the proofs of the above theorems in any college algebra text and 
review the discussions presented there. 

For the more advanced student the following references are recom¬ 
mended. Some of the early chapters of the books may also be read 
with profit by the beginning student. 

Books: 

The Mathematical Theory of Probabilities — Arne Fisher. 

Probability — Coolidge. 

Mathematical Statistics — Rietz. 

Choice and Chance — Whitworth. 

Probability and Its Engineering Uses — Fry. 

Elements of Probability — Levy and Roth. 

Introduction to Mathematical Probability — Uspensky. 

Papers: 

Fundamental Concepts in the Theory of Probability — Fry, American Mathe¬ 
matical Monthly, vol. 41, 1934, p. 207. 

On the Foundations of the Theory of Probability — Struik, Philosophy of Science, 
vol. 1, no. 1, January, 1934. 



6 


Mathematics of Statistics 

Problems 


1. Prove both algebraically and verbally that 

(a) P(n, r ) = C(n, r)P{r, r), ( b ) C(n, r ) = C(n, n — r ). 

2 . From among nine men A, B, C, D, E, F, G, H, /, a committee of 

four men will be chosen. The nine names will be written on nine 
separate cards and four cards drawn at random one at a time from 
a box. 

(а) In how many different ways may the four cards come out? 
Am. 3024. 

(б) How many different committees are possible not including the man A? 
Am. 70. 

3. Consider the word “ introduce.” 

(а) In how many of the possible arrangements of all its letters will there be 
a consonant in the first place? Am. 201,600. 

(б) From its letters how many four letter permutations consisting of three 
vowels and one consonant can be formed? Am. 480. 

(c) If five of its letters are selected at random what is the probability that 
two are vowels and three are consonants? Am. 10/21. 

4. On a table there are four different biographies with brown backs and seven 

different novels with red backs. 

(а) If all of the books are placed upright in a row on a shelf, in how many 
different ways may they be arranged so that the orders of the colors 
are different? Am. 330. 

(б) In how many different ways may two of the biographies and three of 
the novels be selected and arranged on the shelf so that the orders of the 
books are different? Am. 25,200. 

5. In a box there are five red billiard balls with the numbers 1, 2, 3, 4, 5, painted 

on them (one on each ball), and three white billiard balls with the numbers 
1, 2, 3, similarly painted on them. From the box a man draws two balls at 
random. 

(а) What is the probability that one of the balls drawn is white and the other 
is red? Am. 15/28. 

(б) What is the probability that the two balls drawn have either the same 
color or the same number? Am. 4/7. 

6. A bag contains four white, five red, and six black balls. Three are drawn 

at random. Find the probability that (a) no ball drawn is black, (6) 
exactly two are black, (c) all are of the same color. 

7. An urn contains four white and five black balls. Three balls are drawn at 

random and replaced by green balls. If then two balls are drawn at ran¬ 
dom, what is the probability that they are all of the same color? Am. 
29/108. 

8. Write out the expressions for C(n — 1, 2); C(n — 1, 3); C(s, x). 

9. (a) Show that 


C(s -1, x - 1) 


(s - l)(s - 2) • • • (s - x + 1) 
(x - 1) ! 


(s - 1) ! 

(x - 1) ! (s - x) !* 


(6) What is the value of the above expression when x = 1? 




Probability and Statistical Theory 


7 


10 . Write in expanded form: 

(а) ZC(8, x). 

2 = 0 

(б) EC(s - 1, x - 1). 

2=1 

11 . Twelve cards have been dealt, six down, and the other six showing a jack, 

two kings, a seven, a five, and a four. What is the probability that the 
next card will be a four or less? (National Mathematics Magazine, vol. 
XIII, no. 2, p. 94.) 

12 . From an urn containing ten balls, numbered from one to ten, balls are drawn, 

one by one and placed in a row of holes, numbered from one to ten, each 
ball being placed in the proper hole. What is the probability that there 
will not be an empty hole between two filled ones at any time of the 
drawing? {American Mathematical Monthly, vol. 45, no. 9, p. 635.) 
Ans. 2/14,175. 

5. Repeated Trials. We now consider a theorem which is very 
important both in the theory of probability and its applications in 
statistics. 

Theorem VII. Let p be the probability that an event will happen 
in a single trial, and q = 1 — p the probability that the event will fail 
in a single trial. Then the probability P that the event will happen 
exactly x times in s trials, during which p remains constant, is given by 
the (x + 1 )st term of the binomial expansion: 

(1) (<? + vY = T + C(s, 1 )pq- 1 + C(s, 2 )p*q'~* + • • • 

+ C(s, x)p x q*~ x + • • • + p*. 

Proof. By Theorem V, the probability that the event will happen 
x times and fail the other s — x times in any specified order is 
p x q*~ x . But the number of ways in which the order may be specified 
is C(s, x) or C(s,s — x). These ways are equally likely and mutually 
exclusive. Therefore, by Theorem IV the required probability is 
C(s, x)p x q 8 ~ x . We recognize this expression as the (x + l)st term 
of the binomial expansion of (q + p) s . 

Corollary 1. The probability that the event will happen at most 
x times in s trials is the sum of all those terms of (1) in which the ex¬ 
ponent of p is equal to or less than x. 

Corollary 2. The probability that the event will happen at least 
x times in s trials is the sum of all those terms of (1) in which the ex¬ 
ponent of p is equal to or greater than x. 

Proofs. By Theorem IV, the probability that the event will happen 
at most x times is the sum of the probabilities that it will happen 



8 


Mathematics of Statistics 


0,1, 2, 3, • • •, x times. Similarly, the probability that the event will 
happen at least x times is the sum of the probabilities that it will 
happen s, s — 1, s — 2, • • •, x times. 


Problems 


1. A dust storm contains particles of two kinds identical except as to color, 

brown and yellow particles existing in the ratio 3:2. If five particles of 
this dust enter my eye at random determine the probability that two of 
them are brown and the other three are yellow. (See American Mathe¬ 
matical Monthly, vol. 41, no. 5, May 1934.) 

2 . Six coins are tossed once, or what amounts to the same thing, one coin is 

tossed six times. Find the probability of obtaining heads 

(a) exactly three times 

(b) at most three times 

(c) at least three times 
id) at least once. 

3 . (a) What is the probability of throwing seven in a single toss of two dice? 
(6) In six tosses of two dice find the probability of throwing seven at least once. 

4 . Toss six coins 64 times and record the number of times heads appear 0, 1, 2, 

3, 4, 5, 6 times. (Instead of tosses, the coins may be shaken in a box.) 
Compare the resulting distribution of frequencies with the terms of the 
expansion of 64(§ + ^) 6 . 

5. A bag contains white and black balls in the proportion 2:3. Let the 

probability of drawing a white ball be called a success. Three balls are 
drawn separately and after each drawing the ball is returned to the bag 
and thoroughly mixed with the others so that the fundamental probability 
of success remains constant during the trials. Find the probabilities of 
0, 1, 2, 3 successes. If this experiment were repeated 125 times what is 
the theoretical frequency of each of the possible number of successes? 

6. Show that equation (1) may be written: 


7. Show that 


(7 + V )* = X 


s ! 


^ 0 i!(s - x) ! 


(s - 1) l 


= l(x — 1) \ (s — x) l 


t-iq'-x = ( ? +p)«-i = l. 


8. (a) Find the values of C(18, x) for x = 0 to x = 18 inclusive. (To the in¬ 
structor: Pascal’s Triangle provides a simple scheme for constructing a 
table of binomial coefficients.) 

(6) Evaluate 2*/3 18 for x = 0 to x = 18 inclusive. 

(c) Show that ($ + §) 18 may be written 
18 

X fix) where fix) = C(18, x)2 z /3 18 . 

x =o 


id) Using the results of (a) and (6), find the values of fix) for x = 0 to 
x = 18. Save your results for future reference. 



Probability and Statistical Theory 


9 


6. Relative Frequencies from Dichotomous Samples. Suppose a 
sample of s individuals from the same population is divided into two 
groups according as they have a certain attribute or not. Such a divi¬ 
sion is said to be dichotomous. Out of s individuals we find that x 
individuals have the attribute in question and s — x do not, it being 
possible for x to take any integral value from 0 to s inclusive. The 
attribute in question is frequently called the “ event ” and its occur¬ 
rence is called a “ success.” The ratio x/s is called the relative 
frequency of success. 

Many illustrations of relative frequency come readily to mind. 
Out of 100 throws of a coin we may have noted 45 heads. From 
a group of school children, taken at random, we may find 55 
boys. Or again, we might make a certain disease of children the 
basis of a dichotomy. Out of 100 fifth grade school children we 
may find that 27/100 is the relative frequency of the occurrence of 
measles. 

7. Theorem of Bernoulli. The theorem of Bernoulli describes 
the approach of the relative frequency x/s to the underlying con¬ 
stant probability p as s increases. The theorem may be stated as 
follows: 

Theorem VUI. In a set of s trials in which the chance of success in 
each trial is a constant p, the probability P approaches unity that the 
relative frequency x/s will approach p as a limit as s increases indefi¬ 
nitely* 

Observe that this is a weaker statement than saying that p is the 
limit of x/s as the number of trials increases indefinitely. Another 
way of stating the theorem is as follows: The probability Q = 1 — P 
of the difference ( x/s — p) being numerically as large as any assigned 
positive number e will approach zero as a limit as s increases indefi¬ 
nitely. 

The theorem is the basis for our definition of empirical probability. 
It is often regarded as a fundamental theorem of mathematical 
statistics because of the common use of x/s (s large) as a close approxi¬ 
mation to the probability p. 

8. Binomial Description of Frequency. The terms of (q + p) s 
are the theoretical relative frequencies for a dichotomous situation. 
If we take N sets of s trials the theoretical absolute frequencies are 
given by the terms of N(q + p)* when N is chosen so that these terms 
are integers. It follows that N is merely a proportionality factor. 

* A proof is given in Chapter VI, § 8. 



10 


Mathematics of Statistics 


Hence we may say that if in a single trial the probability of an event 
occurring is p and the probability of its not occurring is q, then if 
a sample of s trials is taken, the frequencies with which the event 
occurs 0, 1, 2, 3, • • •, s times are proportional to the terms of the 
point binomial (q + p)\ This was the first theoretical distribution 
to be established and a discussion of it is given in Ars Conjectandi by 
J. Bernoulli which was published posthumously in 1713. A distribu¬ 
tion of discrete variates with frequencies proportional to the terms of 
(1) is frequently referred to as a Bernoulli distribution. 

In the Cams Monograph on Mathematical Statistics (p. 23) Pro¬ 
fessor Rietz explains the applications and limitations of (1) in practical 
statistics as follows: 

Such a distribution . . . serves as a norm for the distributions of relative frequen¬ 
cies obtained from some of the simplest sampling operations in applied statistics. 
For example, the geneticist may regard the Bernoulli distribution (1) as the 
theoretical distribution of the relative frequencies x/s of green peas which he 
would obtain among random samples each consisting of a yield of s peas. The 
biologist may regard (1) as the theoretical distribution of the relative frequencies 
of male births in samples of s births. The actuary may regard (1) as the theo¬ 
retical distribution of yearly death rates in samples of s men of equal ages, say 
of age 30, drawn from a carefully described class of men. In this case we specify 
that the samples shall be taken from a carefully described class of men because 
the underlying assumptions involved in (1) do not permit a careless selection of 
data. Thus, it would not be in accord with the assumptions to take some of 
the samples from a group of teachers with a relatively low rate of mortality and 
others from a group of anthracite coal miners with a relatively high rate of 
mortality. . .. 

The expression “ simple sampling ” is sometimes applied to drawing a random 
sample when the conditions for repetition just described are fulfilled. In other 
words, simple sampling implies that we may assume the underlying probability 
p of formula (1) remains constant from sample to sample, and that the drawings 
are mutually independent in the sense that the results of drawings do not depend 
in any significant manner on what has happened in previous drawings. 

9. Graphical Representation. A binomial distribution may be 
represented graphically by a histogram. This is accomplished by 
constructing rectangles centered at x — 0, 1, 2, • • • , s with heights 
proportional to the terms of the binomial. The different “ successes ” 
denoted by x are the variates, and the corresponding terms of the 
binomial are the theoretical relative frequencies. 

Since the values of x constitute a discrete series it might seem more 
logical to represent the relative frequencies by ordinates instead of 
rectangles. However, since the base of each, rectangle is unity the 



Probability and Statistical Theory 


11 


number representing its height is also its area, and the representation 
by areas will be useful in our work. In a case like this the frequencies 
are said to be “ loaded ” on the 
ordinates at the mid-points of the 
class intervals. 

If we are thinking of relative fre¬ 
quencies or probabilities the sum of 
all the rectangles is unity, whereas 
if we •'are thinking of absolute fre¬ 
quencies the total area of the histo¬ 
gram is N. Thus if six coins are 
tossed 64 times the theoretical ab¬ 
solute frequencies are given by the 
terms of 64 (| + |) 6 . These are 1, 6, 

15, 20, 15, 6, 1 and their sum is 64. 

10. The Mean and Standard Deviation. We have shown that 
the terms of N(q + p) s give the expected frequency of success (with 
respect to an attribute or character) in drawing N samples of s items 
in each sample, where p is the probability of a success. We now 
propose to characterize the distribution of expected frequencies by 
finding the usual moments. In this procedure we may consider the 
relative frequencies given by the terms of (q + p) s because the ab¬ 
solute frequencies are proportional to these terms, N being the pro¬ 
portionality factor. It will be convenient to evaluate first the v’s, 
taking the position of the first term as origin. 

By definition 

. ]pg/(s) 


4 5 

Fig. 1. Histogram of (1 + ^) 6 


x — Vi = 




where x refers to the number of successes and f(x) refers to the 
corresponding probabilities which are of course the theoretical rela¬ 
tive frequencies. Table 1 shows the appropriate frequency table. 
It is obvious that the sum of the second column is unity. To sum 
the third column we factor out sp, obtaining 


spiq*- 1 4- (s — 1 )pq K ~ 2 + C(s — 1, 2 )p 2 q*~ z 
+ •■• + (7(8 — 1, x — 1 )p x ~ 1 q s ~ x + • • • + p ,_1 ] 


which may be written sp{q + p)'~ l — sp. Hence, we have that the 
mean number of successes in s trials is x = sp, where p is the probability 



12 


Mathematics of Statistics 


Table 1 


X 

/ 

xf 

0 

q' 

0 

1 

spq*~ l 

spq'- 1 

2 

s(s - 1) . . . 

2! 

s(s — 1 )p 2 q“~ i 

3 

s(s - l)(s - 2) „ . 

s(s - 1)(« - 2) 

31 pq 

2 , pV 3 


s(s — 1) • • • (s — * + 1) 

s(s — 1) ... (s — X + 1) 


* , p-q■ * 

(X - 1) 1 * 

s 

P* 

sp’ 

Totals 

Hf(x) - (q + vY = 1 

Hxf(x) = sp(q + p)* -1 = sp 


of success in any trial. This result is often called the “ mathematical 
expectation ” or the “ expected value ” of x. 

Table 1 assists our intuitions but logically it is unnecessary. 
We could have proceeded as follows: 


; ! 


! (s - x) ! 


P x q*~ x x -s- £ 


ox ! (s - x ) ! 


pXq »- 


1 


We observe that the divisor is unity and in the dividend we can 
divide x into x !. So, 


»i = E 


i (* — 1)! (s — x) r 


Factoring out sp, we have 


Vl = spj^ 


(s - 1) ! 


r (X - 1) ! (s - x) ! 


px-iqs- 


= sp(q + p )* -1 


whence we obtain 

( 2 ) 


x — sp. 





Probability and Statistical Theory 13 

We will use this procedure in finding the higher moments. Since 
£/(«) = £c(s, x)p x q'~ x = 1 


we may omit it from the denominators of the rest of the m’s. By 
definition then 

^ s ! _, 

"2 =£—r7- r-,P x V x * 2 - 

o x ! (s — x) ! 

Writing x 2 = x (x - 1) + x, we have 

** = t ^vh ~! Pr ’ X(X ~ X) + t^W~ *) ! 

This simplifies into 


p x q'- x x. 


so that we obtain 


s(s - 1 )p 2 (q + p)* -2 + sp 


m 2 = s(s — 1 )p 2 + sp. 


In order to get a we must know the second moment about sp. 
From the relation m = v 2 — Oi) 2 we easily find that 


M 2 = spq 


whence 

(3) <r = (spq) 112 . 

Example 1. Find the mean and standard deviation of the binomial (f + |) 5 
by means of formulas (2) and (3). Verify your results by the usual procedure 
for computing moments of a frequency distribution. 

Solution. Here p = f, q — t, s = 5. By formulas (2) and (3), x = 3, 
o- = 1.095. 

Verification, (f + |) 5 = ^ [32 + 240 + 720 + 1080 + 810 + 243]. 

In finding the moments we may omit the proportionality factor 1/5 5 . 


X 

/ 

u 



0 

32 

-3 

We find 

£/ = 3125 

1 

240 

-2 


!>/ - o 

2 

720 

-1 


2> 2 / = 3750. 

3 

1080 

0 

Hence 


4 

810 

1 

u = 0,x 

= Xo + cu = 3, 

a x = <J U = Vl.2 

5 

243 

2 

M2 = 1-2, 



14 


Mathematics of Statistics 


11. Skewness and Kurtosis. We shall now derive expressions 
for the third and fourth moments. By definition 




' Q x ! (s — x) ! 


-p x q'-*x 3 . 


Writing x s = x(x - l)(z - 2 ) + 3x 2 - 2x, we have 
s ! 


V 3 = Z 


p x q s ~ x x (x — 1)0 - 2) 


o* !(« — «)! 

+ 3 £ “ 77 — 1 — T~. v x q a ~ x x 2 
x<=o^ ! (s - a?) r 

— 2 2 , / ^ ‘ — ~p x q 9 ~ x x 

X=0 x ! (s — x) ! 


= s(« - i)( s - 2)p s j; — 3!) p-^-» 

* = 3 (3 - 3) ! (S - Z) ! 

+ 3[s(s — l)p 2 + sp] — 2 sp 
= s(s — l)(s — 2)p 3 + 3s(s — l)p 2 + sp. 

Similarly, by definition 


V\ 


Z -—- 

! («-*)! 


P x 9' 


e z 4 . 


Writing x 4 = z(x - l)(x - 2)(z - 3) + 6 x 3 - llx 2 + 6 x and pro¬ 
ceeding in a way analogous to that for evaluating p 3 we obtain 

V 4 — 8(8 l)(s 2)(s 3)p 4 -j- 6^3 —■ llvg -f- 6 vi. 

Next we desire the moments about the mean, so that we may obtain 
expressions for skewness and kurtosis. From the relations 

H3 ~ P3 ~ Zv%V\ + 2vx 3 

Hi = V4 — 4:V 3 Vi + 6v 2 Vl 2 — 3»»1 4 


we obtain the quite simple results 
Hz = spq(q - p) 

Hi ~ spq[ 1 + 3(s — 2)pq}. 



Probability and Statistical Theory 


15 


Recalling that ar = Hr/<r r we have finally that 
(g ~ V) 

013 Vm 

1 6 . „ 


We observe that none of these moments are subject to Sheppard’s 
corrections because the assumption that all the values are concen¬ 
trated at the mid-point of an interval is actually true in the case of 
a binomial distribution. This is obvious graphically since each 
frequency is represented by the middle of a rectangle. 

12. A Recursion Formula. The moments m* of a Bernoulli distri¬ 
bution can be obtained in an elegant manner by means of the recur¬ 
sion formula 


(4) 


M*rf 1 = pq 





We know that mo = 1 and mi = 0, so the formula is to be used for 
k ^ 1 . Thus for 


k = 1 , M 2 = pq(sti o — 0 ) 


= spq. 

k = 2, M3 = pq[ 0 — (s — 2s?)] 

= spq(2q - 1 ) 

= spq(q - p). 

k = 3, M 4 = pq[3s 2 pq + s — 6sq + 6 sg 2 ] 
= spq[ 1 -T 3 spq — 6 pq] 

= spq[ 1 +. 3(s - 2 )pq\. 


A simple proof of this formula has been given by A. T. Craig in the 
Bulletin of the American Mathematical Society , vol. 40, pp. 262-264. 

To summarize, we have the important characterizing functions of 
a Bernoulli distribution: 


Mean: x — sp 

Variance: a 2 = spq 

Skewness: a 3 = (q — p)/a 

Kurtosis: a 4 = l/o- 2 — 6/s + 3 

Excess: a 4 — 3 = (1 - 6 pq)/spq. 



16 


Mathematics of Statistics 


13. Mathematical Expectation. If a variable x may assume any 
one of a countable set of mutually exclusive values x\, x 2 , • • •, x n , 
in such a way that f(xi), which we take to be single-valued and non¬ 
negative, is the probability that x takes the value Xi and such that 

22 /(*«) = 1, then x is called a chance variable and f(x) is defined as 

the probability function of the discrete variable x. If the mutually 
exclusive values are 0, 1, 2, 3, • • •, s, an example of such a law of 
probability is 

f(x) = C(s, x)p x q*~ x . 

A frequency distribution whose relative frequencies are given in 
accord with this law of probability is styled a Bernoulli distribution, 
as we have already observed. 

Let the discrete variable x be subject to the law of probability 
f(x) and let g{x) be any function of x. The mathematical expectation 
of g(x), denoted by application of the operator E, is then defined to be 

E\g{x)] =J2g(xi)f(xi). 

i=i 

In particular, if g{x) = x then 

E(x) = !>*/(*<) 

i=l 

— Vi — X 

is the first moment, per unit frequency, about the origin. More 
generally, if g(x) — x k , (k = 1, 2, • • •)> then 

®(**) = !></(*«) = 

»=1 

is the /cth moment about the origin. If f(x) = C(s, x)p x q*~ x and 
g(x) = x k , then 

(5) E(x k ) = 'ZxXCis, x)p x q*~ x 

= o 

defines the moments, about the origin, of a Bernoulli distribution. 
In particular for k = 1, we have 

(6) E(x) = sp. 

If g{x) = (x — sp) k and f(x) = C(s, x)p x q s ~ x , then 

E[(x — sp) fc ] = ^2(x — sp) k C(s, x)p x q t ~ x 

x = 0 




Probability and Statistical Theory 17 

is the kth moment about the mean. For k = 1, we see that E(x — sp) 
= 0, and for k = 2, we have 

< t x 2 = E(x — sp) 2 
= E(x 2 ) — (sp) 2 
which we have seen reduces to 
(7) E(x - sp) 2 = spq. 

Equations (6) and (7) give the mean and variance with respect to 
the number of successes x in s trials. In some statistical investiga¬ 
tions the data are expressed in terms of percentages or rates. When 
we may assume a constant probability underlying the frequency 
ratios obtained from observations we have a binomial distribution 
as before but on a different scale. Instead of the variable being x 
it is now x/s. In this case we have 



For the analogous concept relating to the variance we have 

»> KH'-i 

Therefore, we see from (6) and (7) that the number of successes per 
set of s trials is distributed about an expected value of sp with a 
standard deviation of (spq) 112 . From (8) and (9) we see that the 
percentage of successes in a set of s trials is distributed about an ex¬ 
pected value of p with a standard deviation of (pq/s) 112 . 

In probability theory, the standard deviation is often called the 
standard error. It is important to observe that for a fixed value of p 
the standard error of x about sp increases as s increases and is propor¬ 
tional to (s) 1/2 , whereas the standard error of x/s about p decreases 
as s increases, since it is proportional to (l/s) 1/2 . 

Exercises 

1. Expand the binomial N(§ + §)* for s =2 and s — 8. Find the theoretical 

frequencies in each case by taking N as the smallest number necessary to 
express the terms of each expansion as integers. 

2 . Find the mean and standard deviation for each of the above distributions 

using the appropriate formulas in (4). 

3. Find x, a, a 3 , a* for each of the following binomials: 

(i + *) 7 , (i + I) 4 , (i + I) 18 - 



18 


Mathematics of Statistics 


4. For a certain binomial distribution 

<r = 2.66, a 3 = 0.318. Find p, q, and s. 

5. Assume that .04 is the theoretical rate of mortality in a certain age group. 

Suppose an insurance company is carrying s = 1000 such cases. What is 
the expected dispersion (standard error) in death rates from the theoretical 
rate p — .04? What would it be if s = 10,000? 

6. The value of x for which C(s, x)p*q'~ x is the largest is called the mode of a 

Bernoulli distribution. Show that the mode is the positive integral value 
(or values) of x for which 

sp — q ^ x ^ sp + p. 

References: 

1. Mathematical Theory of Probabilities — Fisher, pp. 99-101. 

2. Mathematical Statistics — Rietz, p. 25. 

7. Suppose the law of distribution of the happening of an event in s successive 

trials is given by the terms of the expansion of 

(q + p)’ = J^C(s, x)p x q*~ x = 

*-0 x-o 

(а) If s = 100 what values of p and q will make P 0 - P; P, = P 10 ? 

(б) Give approximate values of the P’s in (a). 

8 . A bag contains three one dollar bills and four five dollar bills. Three bills 

are drawn at random. For each one dollar bill withdrawn, three two 
dollar bills are returned to the bag, and for each five dollar bill that is 
drawn, a one and a two and a ten dollar bill are returned to the bag. A 
second drawing of two bills is made. Designate by x and y, respectively, 
the values of the first and second drawings, (a) Give in tabular form the 
probabilities for each of the possible simultaneous values of x and y. 
( b ) Evaluate E(x) and E(y). 

Solution, (a) The required probabilities are given in the cells wi thin the 
double lines of the table on page 19. The marginal totals are denoted 

n m 

by g(x { ) and h(yj). The fact that J^gixi) = 1 and J^h(yj) = 1 is a 
In 1 

check on the computations. (6) E(x) = £>,gr(x<) = 26,910/2730 = $9.86, 
1 

E(y) = f \yMyt) = 18,120/2730 = $6.64. 

9. A bag contains three one dollar bills and two two dollar bills. Two bills are 

drawn at random. For each one dollar bill drawn two two dollar bills are 
returned to the bag, while for each two dollar bill drawn a one and a two 
dollar bill are returned to the bag. A second drawing of two bills is made. 
Designate by a; and y, respectively, the values of the first and second draw¬ 
ings. Give in tabular form the probabilities for each of the possible 
simultaneous values of x and y. Find E(x) and E(y). 

10. For the more advanced student: Read and report on the follow ing article, 
Urn Schemata as a Basis for the Development of Correlation Theory — Rietz, 
Annals of Mathematics, (2), vol. 21 (1920), p. 306. 




Probability and Statistical Theory 19 


X 

3 


7 


11 


15 

h(yi) 












1 


12 


18 

1 

4 

3 

30 

20 

35 ' 

0 

35 ' 

0 

35 ' 

78 

35 

78 

2730 


1 


12 

3 

18 

4 

4 

3 

120 

15 

35 ' 

0 

35 ’ 

78 

35 ’ 

78 

35 

78 

2730 


1 


12 

7 

18 

10 

4 

9 

300 

12 

35 ’ 

0 

35 ' 

78 

35* 

78 

35 

78 

2730 


1 


12 

2 

18 

8 

4 

18 

240 

11 

35 ' 

0 

35 ’ 

78 

35 ’ 

78 

35 

78 

2730 


1 

6 

12 

3 

18 

1 

4 


60 

10 

35 ’ 

78 

35 ' 

78 

35 

78 

35 ’ 

0 

2730 


1 

36 

12 

21 

18 

10 

4 

3 

480 

7 

35 ’ 

78 

35 

78 

35* 

78 

35 

78 

2730 


1 


12 

6 

18 

8 

4 

6 

240 

6 

35 ' 

0 

35 ’ 

78 

35 

78 

35 ’ 

78 

2730 


1 

36 

12 

21 

18 

10 

4 

3 

480 

4 

35 ’ 

78 

35 ’ 

78 

35* 

78 

35* 

78 

2730 


1 


12 

14 

18 

20 

4 

18 

600 

3 

35 

• 0 

35 ' 

' 78 

35 ' 

78 

35 * 

' 78 

2730 


1 


12 

1 

18 

6 

4 

15 

180 

2 

35 

• 0 

35 

' 78 

35 ' 

78 

35 

’ 78 

2730 


78 

936 

1404 

312 


g(xi) 

2730 

2730 

2730 

2730 

1 


14. Approximating the Binomial with the Normal Curve. If we 

plot the terms of (g + p) a as ordinates against the values of x/Vs 
as abscissas and draw the corresponding histogram we find that it 
approaches a smooth curve as s is taken larger and larger. Thus in 
Figure 2 (where the vertical sides of the rectangles are omitted since 
they contribute nothing to the interpretation) we see how the stair¬ 
case outline of the histogram approaches close to a continuous curve 
as s is taken larger. 




20 


Mathematics of Statistics 




we see that a 3 —»0 and 
04 —>3 as s —> oo. This 



suggests the possibility of 
approximating the binomial 
with the normal curve. As 
a matter of fact, it t can be 
proved, under certain con¬ 
ditions of approximation, 
that (q + p)* approaches 
the normal curve as a limit 
as s —» oo. The proof* will 
not be given here but a 
word or two about it may 
be appropriate. In using 


the normal curve to approximate the binomial we are particularly 


interested in a range of three or four standard deviations from the 


mean. This fact suggests the reasonableness of assuming that the 
number of successes x' above or below sp be considered as the same 
order of magnitude as <r. This means that x'/(spq) 112 shall remain 
finite as s —» «> . Now ( spq ) 1/2 is of order (s) 1/2 if neither p nor q 
is extremely small. Hence the propriety of assuming (in the proof) 
that x'/(s) 112 shall remain finite. This is the reason for plotting the 
histograms (Figure 2) in terms of x/(s) 112 . 

We may expect, therefore, that the fitted normal curve will give 
a fair approximation to the binomial except possibly at the extremi¬ 
ties of the range. When the terms of the binomial are arranged 
symmetrically with respect to the mean, that is, when p = q, the 
approximation is rather better than otherwise. 


* The following references axe recommended: 

Mathematical Statistics — Rietz, pp. 32-35. 

Probability and Its Engineering Uses — Fry, pp. 207-213. 
Annals of Mathematical Statistics, yol. 1, p. 197. 





21 


Probability and Statistical Theory 

Exercise 

Fit a normal curve to the binomial (f + |) 18 . Directions: This binomial may 
be written 

18 2 * 

Z/O) where /(x) = C(18, x) — • 
i=0 3 U 

(See Problem 8, § 5.) Next recall that the equation of the normal curve is 

N _ 

y = — 0(0 

a 

where 0(0 = -X=e~^ 12 and t = —-— 

v23T a t 


If we set iV = 1, 2 = sp, and a = (spq) l > 2 we shall expect that y will give, ap¬ 
proximately, the values of /(x) for the various values of x. As in Chapter VI of 
Part I the following outline is suggested for organizing the computations. 


* 

t 

0(0 

y 

/(*) 







Construct the histogram and draw the curve. It is suggested that paper ruled 
“ 20 to the inch ” be used. By comparing the last two columns and also judging 
from the figure, does the fit seem to be good, even though s is rather small and 
q = ip? 

The above exercise will help the student appreciate a theorem 
which will now be introduced. The sum of successive terms of the 
binomial equals the area of the corresponding rectangles in its histo¬ 
gram. We may obtain an approximation to this sum by finding the 
area under the fitted normal curve which these rectangles occupy. 
Graphically, the values x = 0, 1, 2, • • •, s are the mid-points of the 
bases of these rectangles. Therefore, if we are summing the terms 
of the binomial in which x ranges from x = di to x = d 2 , inclusive, 
the corresponding area under the curve will be from x = di — | to 
x = d 2 + b We must convert these values into standard units in 
order to enter a table of areas of the normal curve. Hence we have 
the following theorem. 

Theorem IX.* The sum of those terms of the binomial (q + p)* in 
which the number of successes x ranges from d\ to d 2 , inclusive , is 
approximately 

Q =£'<Ht)dt 

* Sometimes called the De Moivre-Laplace Theorem. 



22 


Mathematics of Statistics 


where 



U = 


dj \ — sp 
<r ’ 


<r = ( spq) m . 


Example 2. In tossing six coins what is the probability of obtaining 2, 3, 4, 
or 5 heads? 

Solution. We have sp = 3, <r 2 = f, <h = 2, <Z 2 = 5. Hence, = — 1.5/(|) 1/a 
= -1.225, U = 2.5/(|) 1/2 = 2.041. Therefore, 

/ 2.041 /»1.225 /»2.041 

= / + I = .38971 + .47932 = .869. 

-1.225 JO Jo 

Although the use of Theorem IX assumes s large we obtain here with s small a 
good approximation to the exact value Q = l = .875. In this example it would 
have been a simple matter to evaluate and sum the terms of the binomial but 
when s is large and the range from di to dz includes many terms this procedure 
may be very laborious. When this is the case the above theorem gives an ap¬ 
proximation which may be quite satisfactory. The approximation is good if 
di lies on one side of the mean and d 2 on the other at approximately equal dis¬ 
tances. 



Example 3. Suppose p = .2 is the probability of success in a single trial. 
Estimate the probability of obtaining less than five or more than fifteen successes 
in fifty trials. 

Solution. The required probability, indicated by the shaded area in Figure 3, is 
P = 1 — Q where Q is the probability of obtaining more than 4 and less than 16 
successes. In using Theorem IX, we have 

sp = 10, a = 2.828, h = -1.944, h == 1.944. 

Therefore, 

X 1.944 

= .0519. 

The exact probability is obtained by evaluating and adding the sixth to the 
sixteenth terms of (.8 + .2) 60 and subtracting the result from unity. However, 





Probability and Statistical Theory 


23 


instead of computing these terms separately, a systematic procedure may be set 
up by which each term is made to depend upon the preceding term. Thus we 
may write a binomial as follows: 


(? + V)* = «*d + k) s = q 3 


s(s - 1) . s(s-l)(8 -2) 

2 ! k + 3 ! 


k z + • • • + A* 


’} 


P 

where k — —. Then q may be computed by logarithms and its product with the 

terms in the brackets may be obtained on computing machines by a continuous 
process. Thus for the terms within the brackets, 

the second term is first term multiplied by sk, 


g _i 

the third term is second term multiplied by ——— k » 


the fourth term is third term multiplied by —-— k • 


the rth term is (r — l)s< term multiplied by 


s ~ (r - 2) * 


In this way we find Q = .9497, so the required probability is P = .0503. For 
most practical purposes the approximation by use of the Theorem IX would be 
satisfactory. 

Example 4. Find the probability that in throwing 100 coins one will obtain a 
number of heads which will differ from the expected number by less than five. 

SolvMon. 


4.5 

ti — — — = — .9, 
5 


So the required probability i 
given by 


Q - 


■r 


Example 5. In the binomial 
(.95 + .05) 30 where p = .05 is 
the probability of success in a 



Fig. 4. First Seven Terms op (.95 + .05) 30 


single trial, find the probability of as many as seven g 

Solution. This binomial is too skew for a good fit with the normal curve, so 
the first seven terms of the expansion are evaluated. (See Figure 4.) Their sum 
is .9994 and this is the probability for less than seven successes. Therefore the 
probability for seven or more successes is .0006. 




24 


Mathematics of Statistics 


16. Simple Sampling of Attributes. It is a matter of common 
experience that certain fluctuations between observation and expecta¬ 
tion under a given hypothesis may be explained on the basis of chance. 
For example, in throwing 100 coins an observed result of 45 heads 
and 55 tails does not warrant the conclusion that the coins are biased. 
In such cases a very natural question arises as to what sampling 
deviations may be allowed before we conclude that they indicate the 
operation of definite and assignable causes, i.e., that the results are 
inconsistent with the given hypothesis. The theory dealing with 
such fluctuations in relative frequencies is called sampling of attri¬ 
butes. 

Suppose we are given a sample of s individuals of which x have 
a certain character or attribute. The question then arises: Is this 
result consistent with the hypothesis that the sample is drawn from 
a population having the fraction p with the given character? Could 
it reasonably have arisen on the basis of chance or is it significant of 
other than chance factors? In answering this question our common- 
sense judgment is greatly aided by a probability scale for chance 
fluctuations under the given hypothesis. We therefore restate our 
question* more precisely as follows: 

Suppose the probability of an event is known from theoretical 
considerations to be equal to p. What is the probability that in s 
trials the number of successes will differ numerically from the ex¬ 
pected number x = sp by as much as (or more than) an observed 
amount d? 

The required probability may be estimated by means of the fol¬ 
lowing corollaries to Theorem IX. 

Corollary 1. The probability that the number of successes x in s 
trials will differ from the expected number x = sp by more than |d| is 
approximately given by P s = 1 — Qs where 

J r»5 d + h 

(j>(t)dt and 8 =-- • 

o a 

Corollary 2. If the words “ more than ” in Corollary 1 be re- 
d-\ 

placed by “ as much as,” then 8 = —-— 

The proofs are obvious if we admit that the normal curve fits the 
histogram of the point binomial. 

* See Problems in Sampling — Camp, Journal American Statistical Associa¬ 
tion, p. 964, December, 1923. 




Probability and Statistical Theory 25 

In another slightly different form involving relative frequencies, 
Q s gives approximations to the probability that the difference be¬ 
tween an observed relative frequency of success x/s and the true 
probability p satisfies the relation 

I H“(?r 

for every assigned positive value of 5. 



In using Corollary 1, Table 2 gives a general idea of the magnitudes 
of probabilities for certain deviations. It is divided into two sections: 
the first section lists probabilities for specially selected deviations, 
the second section lists deviations for specially selected prob¬ 
abilities. 

A computed probability is used to scale our judgment as to whether 
the deviation in question can be explained on the basis of chance. 


Table 2. Abridged Normal Probability Scale 


Deviation 

S 

Chance of 
Deviation 

Outside ± 5 

Deviation 

& 

Chance of 
Deviation 

Outside ± 5 

0.5 

.617 

.67 

.50 

1.0 

.317 

1.28 

.20 

1.5 

.134 

1.64 

.10 

2.0 

.064 

1.96 

.05 

2.5 

.0124 

2.33 

.02 

3.0 

.0027 

2.58 

.01 

3.5 

.00047 

2.88 

.004 


If it cannot be so explained, it is said to be “ significant ” of other 
than chance causes. In passing judgment on a deviation it is some- 



26 


Mathematics of Statistics 


times difficult to give a definite answer. Good judgment in these 
matters only comes from much experience in the particular field. 
However, we shall not often be wrong if we draw the following 
conventionalized conclusions about P s for a deviation outside ±5: 

If P s ^ .05, 8 is not significant, 

If P s < .01, 5 is significant, 

If .05 > P s > .01, our conclusion about 8 is doubtful and we can¬ 
not say with much certainty whether the deviation is significant or 
not until we have more information. 

We see from Table 2 that this rule allows chance fluctuations to 
explain a deviation from the expected value of as much as 2.58 in 
standard units. In some situations it may be desirable to extend 
this range and place the bounds of chance fluctuations at 8 = ±3. 
There is then a correspondingly greater degree of certainty that 
deviations outside these limits are significant. 

Example 6. ( Rietz ) A group of scientific men reported 1705 sons and 1527 

daughters. The examination of these numbers brings up the following funda¬ 
mental questions of simple sampling. Do these data conform to the hypothesis 
that I is the probability that a child to be bom will be a boy? That is, can the 
deviations be reasonably regarded as fluctuations in simple sampling under this 
hypothesis? In another form, what is the probability in throwing 3232 coins 
that the number of heads will differ from (3232/2) = 1616 by as much as 
d = 1705 - 1616 = 89? 

88 5 /’3.H3 

Solution, s = 3232, ( pqs )*'* = 28.425, 8 = —f— = 3.113, P$ = 1 - 2 / 
28.425 Jo 

= 1 - .9981 = .0019. 

Hence we conclude that these data cannot be explained on the basis of chance, 
i.e.y they are inconsistent with an hypothetical sex ratio of 

16. Probable Error. The word error is technically used in statis¬ 
tics to denote a deviation from the expected value. The deviation 8 
for which P s = .5 is commonly called “ probable error.” This term 
is misleading because it is not the most probable error. Equally 
likely deviation would be a more appropriate name for it. 

From the normal probability scale we find that this deviation is 
8 — .6745 in standard units or .6745<r in arbitrary units. Hence for 
a normal distribution, probable error is equivalent to the quartile 
deviation which, in Part I, we have called E in x units and s in 
standard units. In other words, the probability is one-half that a 



Probability and Statistical Theory 27 

variate chosen at random will have a value within the range 
E(x) ± .6745<r*. This definition of probable error combines the 
assumption of a normal distribution with the specification of an 
even wager. 

Used as a scale unit along the x-axis, probable error is sometimes 
simply defined as a yardstick which is approximately |<r*. This 
definition does not impose the condition that the distribution neces¬ 
sarily follow the normal curve. But there is no real gain in the re¬ 
moval of this condition if, for an interpretation of the signficance of 
such a deviation, we must refer to a normal probability scale. That 
is, in testing the significance of a discrepancy between an observed 
value and the expected value there is no merit in expressing that 
discrepancy in multiples of approximately §<r instead of a itself. It 
would seem that the language of probable error should be aban¬ 
doned. 

17. Standard Error and Correlation of Errors in Class Frequen¬ 
cies. When the probability distribution of a variable is known the 
expected frequency in any class interval may be determined. Sup¬ 
pose we have obtained from a random sample of a Bernoulli distri¬ 
bution an observed frequency distribution. The variates, N in 
number, should be distributed into n class intervals containing f\, 
f 2 , •••/'» each. Instead of this suppose we find /i, f 2 , • • • /» where 


Ef < - N = E/, 

Let Table 3 represent the two distributions. 

Suppose next that a large number of such samples of N variates 
each are obtained under the same essential conditions. The ob- 




Table 3 


Class 

Class Mark 

Observed 

Theoretical 

Frequency 

Frequency 

1 

Xi 

fi 

r i 

2 

Xi 

f2 

/'* 

% 

Xi 

fi 

fi 

n 

Xn 

u 

f'n 




28 


Mathematics of Statistics 


served and expected distributions will not agree in practice unless 
the samples are distributed exactly as the universe from which they 
are drawn. In the above table, the x’s are to be regarded simply as 
compartments and do not change. Only the frequencies change 
from sample to sample. Any class frequency /, will vary from 
sample to sample, and these values of f a will form a frequency 
distribution. 

It is important in certain problems to have an expression for the 
expected value of the variance <r /g 2 of this distribution in terms of 
observed values. To derive this expression we let p t = f'„/N be the 
probability that a variate will fall in the class s and q 9 — 1 — p, be 
the probability that it will fall elsewhere. Then, considering the N 
variates as observations or trials, the theoretical distribution of fre¬ 
quency in this class will be given by (q, + p g ) N and the square of the 
standard deviation of /, in the theoretical distribution is given by 

<7 /g 2 = Np,q s . 

If we accept the observed relative frequency f»/N as an approxima¬ 
tion to p t then we have 



which reduces to 

( 10 ) =f 

as an approximate * value of the desired expression. 

We will next consider the correlation between deviations from the 
expected values of the frequencies in any two classes, say the sth 
and <th. Let 8f a be a deviation from the expected value or theo¬ 
retical mean of the sth class corresponding to a deviation 8f t from 
the expected value of the tth. class. Since the total frequency is N, 
N — f» is the frequency which is distributed in classes other than the 
s class. If we obtain an excess 8f, in the s class then — 8f , must be 
distributed among the other classes. If deviations from the ex¬ 
pected values are due only to random sampling fluctuations it is 

* When the sample is small, researches have shown that a better approxima¬ 
tion can be obtained by multiplying the right side of (10) by N/(N — 1). See 
Bietz, Mathematical Statistics, pp. 120-122. 




Probability and Statistical Theory 


29 


reasonable to assume that — 8f a is distributed among the other classes 
in proportion to their expected frequencies. Therefore, as the con¬ 
tribution from the f t class, we have the proportion f t /(N — /,) and 
the number (—8f a )f t /(N — /,). 

If the mean value of 8f t equals — 8f a f t /(N — /,), for 8f a assigned, 
then —ft/(N — f a ) must be the regression coefficient of 8f t on 8f a . 
Therefore, 

ft _ °h/t „ <*f t 

~ t;—t = UftB/ a — = rf t f. — 

Jy — /« <T5f, <Tf t 




NO- - Ps) 


Np a O - P.) 


= - ftp .= -^- 


Hence we have the result 


Clearly, r&f t s f , = r /t f„ and trj/, 2 = <r/ t 2 , <rsf, z = a/, 2 , since the 5’s 
measure deviations from their expected frequencies.* 

For an application of the above formula and the Bernoulli Theory 
in general see The Use of Statistical Techniques in Certain Problems 
of Market Research — Brown. Publication of the Graduate School 
of Business Administration, Harvard University, vol. XXII, no. 
3, 1935. 

18. The Poisson Exponential. If p (or q) is small the normal 
curve cannot ordinarily be used with confidence to approximate the 

* The correlation of errors here is properly a multivariate problem depending 
on the multinomial distribution. The argument given above indicates the 
plausibility of the result but it is not to be construed as a rigorous proof. By 
means of more advanced mathematics the correlation coefficient can be proved 
to have the result found without making use of the assumption that any excess 
frequency in one class is distributed among the other classes in proportion to 
their frequencies. In other words, the assumption is superfluous. 



30 


Mathematics of Statistics 


terms of the binomial (q + p)*- If s is large but sp is in the neigh¬ 
borhood where x is small, a useful approximation to 


( 12 ) 


fix) ■ 


x ! (s - x) ! 


p*q*- 


may be given by means of the Poisson exponential function. Sta¬ 
tistical examples of this situation are sometimes called rare events 
and occur in widely different fields; for example, the number born 
blind per year in a large city, the number of organisms of a given 
size Sona given glass slide that escape death by X-rays after being 
exposed for t seconds, the number of times in a certain year that the 
volume of trading on the New York Stock Exchange exceeds M 
million shares, the frequency of certain “ peaks ” in a given time 
interval such as occur in telephone “ traffic,” and other problems in 
demands for services. 

Suppose, then, that p is the probability for the occurrence of 
the rare event in question and assume that q = 1 — p is nearly 
unity. Let s be so large that s ! and (s — x) ! may be replaced 
by their Stirling approximations [cf. (12) of Chapter II]. Making 
these replacements, (12) becomes 


(13) 


fix) = 


S s+l/2 e -spxqs-x 

X ! (s — x) *-z+l/2 e ~ a + x 


Writing the second factor in the denominator of (13) in the form 
§*-*+ 1 / 2(1 — x/s)*~ x+1/2 , it is readily seen that (13) becomes 


fix) = 


(sp)*er*(l - p)*-« 




Now when x is small and s is large,* 


o-r"-(■+)• 

= e~ x 


and 


(l - p)«-»« (l - py 
~ e~’ p . 


The symbol = is used to mean “approximately equal/ 





Probability and Statistical Theory 31 

For the required approximation we have, therefore, 


(14) 


/(*) 


m x e~ m 
x ! 


where m = sp. This is Poisson’s exponential function. It is tabu¬ 
lated for various values of m and x in Tables for Statisticians and 
Biometricians. The terms of the series 


(15) 


e -»(l + m + - + -+. 



give the probability of exactly 0, 1, 2, • • •, or x occurrences of the 
rare event in s trials. It is worthy of note that the Poisson expo¬ 
nential has only one parameter, m, whereas the normal curve has two 
parameters, the mean and <r. 

Certain simple and interesting results may be obtained for the 
moments of the distribution given by (14) when x takes all integral 
values from x = 0 to x — s. First we observe that when x = s 
in (15) we have 



approximately if s is large. Then 
E{x) = pi = £ «/(*) 

x = 0 

= L +m + Fi + '" + (r^T)lJ 

= me~ m e m 

— m = sp, approximately. 

And 

v 2 = S^ 2 /^) 

0 

= ^[x(x - 1) + x]f(x) 

0 

= m(m + 1), approximately. 



32 


Mathematics of Statistics 


From these results, we have 
Mean = m = sp 

Mi = m(m -f- 1) ■ 
= ra. 

.’. a = (m) 1/2 . 


It may also be shown that 

v z - m(m 2 + 3m + 1) 

Vi = m(m? + 6m 2 + 7m + 1) 
whence we find that 


and 


M3 = m 
M4 = 3m 2 -f- m 


m 1/2 ’ 


= 3 4- - 


It is a rather striking result that each of the mean, variance, and 
M 3 is equal to m. 

The importance of the Poisson approximation in dealing with 
certain problems in telephone engineering and other fields is dis¬ 
cussed in Fry’s book, Probability and Its Engineering Uses. The 
interested student might investigate and prepare a special report on 
some of these applications. 


Problems 

1. Use Theorem VIII to approximate the following sums: 

(a) the terms of (i + f ) 90 in which 50 ^ x ^ 70. 

(b) the terms of (.946 + .054) 621 in which x > 34. 

2. Fit a normal curve to the point binomial (§ + §) 4 . 

3. Fit a normal curve to (f -f |) 6 . 

4. Suppose you are studying IQ’s and it is known that 20% in the universe with 

which you are dealing have an IQ below M, so that I is the probability 
that an individual chosen at random has an IQ below M. (M itself has 
no bearing on the solution of the problem.) If a teacher had a class of 
fifty which could be regarded as a random sample from this universe, 
would it be exceptional if she found less than five or more than fifteen 
with IQ’s below Ml (See Example 3.) 

6. Vital statistics gathered over a long period of time indicate that 5% of 
patients suffering from a certain disease die from that disease. Suppose 
that out of 30 cases examined in a certain city seven deaths were re¬ 
ported. Was this unusual? (See Example 5.) 





Probability and Statistical Theory 


33 


6 . (Camp) A dean’s report showed the following figures: 


Subject 

Honor Grades 

Failures 

Number 

Examined 

Number 

% 

Number 

% 

German 

187 

36 

33 

6.3 

521 

Mathematics 

162 

35 

38 

8.2 

466 

Music 

11 

50 

0 

0.0 

22 

All Subjects 


38 


5.4 



Taking p = .38 for honor grades and p = .054 for failures find the prob¬ 
ability: (a) that in selecting at random (from a supposedly infinite num¬ 
ber), one would obtain as few honor grades as were obtained in German; 
( b ) as many failures; (c) in selecting 466 at random, one would obtain as 
few honor grades as were obtained in mathematics; (d) as many failures; 

(e) in selecting 22 at random, one would obtain no failures (as in music); 

( f) eleven or more honor grades. 

Hints, (a) Find sum of terms of (.62 + .38) 621 in which x ^ 187. 

(5) See Problem 1 (6) above. 

(e) Evaluate (0.54) 22 by logarithms. 

7. ( Burgess) If analyzed past experience shows that 4% of all insured white 

males of exact age 65 have died within a year, and it is found that 60 of a 
similar group of 1000 actually die within a year, should the group be re¬ 
garded as essentially different from the general mass — that is, is the 
departure from the expected mortality greater than might be expected as 
a result of chance variation alone? 

8 . ( Richardson ) In a coin tossing experiment in which a coin was tossed 400 

times, 250 heads appear. Do you believe the experiment was honestly 
performed? 

9. ( Lovitt and Holtzdaw) Would you be willing to bet 10 to 1 that an opponent 

could not throw the sum 7 with two dice at least 23 times in a hundred 
throws with two dice? 

10. (Lovitt and Holtzdaw ) The 1919 report of the Census Bureau in its bulletin 
on Mortality Statistics shows the average death rate from tuberculosis (all 
forms) for the period 1906-1910 to be 163.5 per 100,000 of population 
and a = 12.78. 

In the following instances is the variation from the average such as to 
justify one in constructing a theory as to the causes of this variation? 


California 

210.4 

Colorado 

244.2 

Michigan 

99.7 

N. Y. Bronx 

445.7 

Scranton, Pa. 

97.4. 



34 


Mathematics of Statistics 


11. A sociologist who is interested in the characteristics of a certain race which 

we will call R, hit on the idea of trying to sort R’a from non-fi’s in the 
writings of unknown persons. Accordingly he persuaded a colleague to 
let him have 64 examination papers, with names removed, from psychology 
classes at Blank University. On 43 of these papers he correctly spotted 
the students as R’a or non-R’a. In 21 cases he missed. Find the prob¬ 
ability of this performance having resulted from pure chance. 

12. A coin is tossed s times. It is desired that the relative frequency of the 

appearance of heads shall not be greater than .51 or less than . 49 . Find 
the smallest value of s that will insure the above results with a degree of 
certainty Qs ^ .90. 

Solution. We must determine s such that Qs = .90 (at least) that 


We have 

6 = .02 Vi 


since p = g = i- Also 
Qs 




4>(t) dt 


.90 


whence from the tables, we find 8 = 1.645. Therefore, 


and 


.02 Vs = 1.645 
s = 6745. 


13. A coin is tossed s times. It is desired that the relative frequency of the 

appearance of heads shall not be greater than .502 or less than .498. Find 
the smallest value of s that will insure the foregoing results with a degree 
of certainty P > Jf- 

14. (Camp) A census report showed that in general 59.58% of New York City 

children went to school, but that only 56.8% of the negro children went 
to school. The number of negro children was 20,000. Was the difference 
due to chance? 

16. Read and give a report on the reference given at the end of § 17. 

16. Find applications of the Poisson exponential function in the literature and 
report on them in class. 




CHAPTER II 

SOME USEFUL INTEGRALS AND FUNCTIONS 

To avoid interruption later on we will discuss here certain integrals 
and functions which will be useful in subsequent chapters. 

1. The Gamma Function. The improper integral 

(1) r(n> = f x n ~ x e~ x dx, n > 0, 

is called the Gamma function of the positive number n. The differ¬ 
ence equation 

(2) r(» +1) - »r(n) . / '(/>) ■■ ;/,->! O' ') 

is easily established from (1) by integration by parts (see the chapter 
on the Gamma function in any textbook on advanced calculus). 

By successive reduction of (2) we obtain 

T(n + 1) = n(n - 1) • • • (n - k)T(n - k) 

where k is a positive integer less than n. If n is also a positive 
integer and k = n — 1 then we have 

(3) r(n + 1) = n ! 

since from (1), T(l) = 1. Because of (3) the 
Gamma function is sometimes called the fac¬ 
torial function. It may be considered as a 
generalization of n ! when n is fractional. The 
graph of the function defined in (1) is shown 
in Figure 6. It can be drawn from the following 
values, some of which follow immediately from 
(2) and the others will be established later. 

r(2) = l. 
r(3) = 2. 
r(4) = 6. 

35 













L 




i 




/ 


□ 


/ 




/ 



z 







r(0) = oo 
r(i) = i 
m = (tt ) 1/2 


0 12 3 4 

Fig. 6 




36 


Mathematics of Statistics 


Other forms of (1) may be obtained by changes of variable. For 
example, 

(4) r(n) = 2 f y in ~ l e~ v2 dy, by x = y 2 . 

Jo 

From this form we can show that 


f e~v 2 dy = K 1 *’) l/ 

J o 


To establish (5) we first observe from (4) that 

( 6 ) m = 2f e-y'dy. 

Jo 

Since (6) is independent of the variable of integration, we may also 
write 

r(!) = 2 f e~* 2 dx. 

Jo 

So 

[r(!)] 2 = 4 f e~ x2 dx f e^dy 

Jo J o 

(7) =4 f f e~ (x2+ v 2) dx dy, 

Jo Jo 


the passage from the product of two integrals to the double integral 
being valid since neither the limits nor the integrand of either integral 
depend on the variable in the other. 

To evaluate (7) it will be convenient to change to polar coor¬ 
dinates. First, however, we will make a few remarks about a change 
of variables in general. Let x and y be the coordinates of a point 
with respect to a set of rectangular axes in a plane, u and v the 
coordinates of another point with respect to a similarly chosen set 
of rectangular axes in some other plane. Suppose we have a function 
of the variables (x, y), 

z = /(*, y), 


and we make x and y depend on new variables u and v by the rela¬ 
tions 

x = g{u, v ) and y = h{u, v). 



Some Useful Integrals and Functions 


37 


These relations establish a certain correspondence between the 
points of the two planes. Let dA be an element of area for the 
function f(x, y). Then it is shown in advanced calculus * that 


where ) is a convenient symbol for the absolute value of 

\u, v) 


the determinant 


dx 

dx 

du 

dv 

dy 

'dy 

du 

dv 


dx dy dx dy 
du dv dv du 


and the latter is called the Jacobian or functional determinant of the 
transformation. 

If, then, we change (7) to polar coordinates by letting 
[ x = r cos 6 


( 8 ) 

the Jacobian is 


Therefore, the element of integration dx dy becomes r dr dd. The 
limits of integration are now from 0 to » for r and from 0 to 7r/2 
for 0. From (8), x 2 + y 1 = r 2 . So (7) becomes | 


[r(i)] 2 = 4 f f e~ r2 r dr c 
«/o J o 

dd — t. 

0 


* See Mathematical Analysis, Goursat-Hedrick, yol. 1. 

| The transformation to polar coordinates and subsequent in¬ 
tegration involves a remainder term T which is the integral 
over an area between a quadrant of radius R and a square of 
side R. But it can be shown that T —* 0 as R —* <». (Cf. 
Wilson’s Advanced Calculus, p. 364.) 






38 


Mathematics of Statistics 


Hence, 

(9) r(« = 


and from (9) and (6) we obtain (5). 

For a more general form of (5) we may let y = t/(2k) li2 , k > 0, 
and obtain 

(10) f e-‘ 2/2 * dt = H2tt*) 1/2 , 

VO 

and 


(10a) J dt 

An alternate derivation of (9) 



= (2it*) 1 ' 2 . 

be given as follows. The right- 
hand member of (7) repre¬ 
sents the volume V under 
the bell-shaped surface 

(11) z = e _(l2+ ^ ) 

and so from (7) we have 
— x r(|) = F 1 ' 2 . Since (11) is 
a surface of revolution we 
may take as the element of 
volume a cylindrical shell 
of radius r, thickness dr, 
and height z. Then 


dV = 2tt r dr z = 2irre- f * dr, 

V = 2t f e -r V dr = ir, 

J o 

and consequently we obtain (9). 

2. Stirling’s Approximation. An asymptotic expression, that is, 
an approximation with small percentage error, may be obtained 
for n ! when n is large. The following formula 
(12) n ! = (2rr) 1/2 n" +1/2 e -n 

is called Stirling’s approximation. A closer approximation is 

n ! = (2ir) 1/2 n n+1/2 e~" (l + + • • • ^ • 

However, the first term usually gives sufficiently close approxima- 




39 


Some Useful Integrals and Functions 


tions if n is fairly large. A derivation of (12) may be found in 
several places. Among these are 

Probability and Its Engineering Uses — Fry, D. Van Nostrand Company; and 
Introduction to Mathematical Probability — Uspensky, McGraw-Hill Company. 
Seven-place tables of log n ! up to n = 1000 are given in Glover's Tables . 

3. The Beta Function. The definite integral 

(13) B(m, n) = f z m-1 (l — x) n ~ l dx 

Jo 


is called the Beta function of any two positive numbers m and n. 
Another useful form is 

/»x/2 

(14) B(m, n) = 2 I sin 2 ” 1-1 0 cos 2n_1 0 dd 

Jo 

which is obtained by letting x = sin 2 0 in (13). 

If we let x = 1 — y, (13) becomes 


B(m, n) = f (1 - y) m ~ l y n - 1 

*lo 

= f (1 — x) m ~ l x n ~ 1 

Jo 

= B(n, m). 


dy 

dx 


Therefore, m and n may be interchanged. 

A relation between the Beta and Gamma functions may be ob¬ 
tained as follows. From (4) we may write 


T(w)r(m) = 4 J* x 2 n ~ l e ~ x2 dx j* y 2 m ~ l e~^ dy 

= 4 J f x 2n_1 y 2 m_ 1 e _( xi +v^ dx dy. 

Jo Jo 


Since the region of integration is the first quadrant of the xy -plane 
we have, upon changing to polar coordinates, 


r(n)r(m) = 4 



r 2(m+n—l)g-r* gj n 2wi-l Q COS 2 " -1 07* dd dr 


= 4 f sin 2m_1 0 cos 2n_1 0 dd f 
Jo Jo 

= B(m, n)T(rn + w), 


elf* 




40 


Mathematics of Statistics 


by (14) and (4). Hence 


(15) 


B(m, n) = 


T(m)T(n) 


r (m + n) 

4. Reduction to Gamma and Beta Functions. By appropriate 
changes of variables many of the integrals that occur in statistics 
may be evaluated by expressing them in terms of Gamma and Beta 
functions. 

Examples 

(a) Prove that 

jrv-.--.-K¥)”© 

Solution. This integral may be written 


{y2) ( ^ 2)/2 e~ W/2ff2 d(y>). 


By the substitution 


Ny* 

' 2<r 2 ’ 


2<r 2 

d(y 2 ) =— dx 


1/2«r 2 V /2 r°° 

2\N/ Jo X 


(iV—2 )/2g—x = 


1/2o_ 2 Y 
2 \ iV / 


(6) Determine A: so that 


*1 


( s 2)(JV-3)/2 ^( S 2) = 


Solution. By the substitution 
Ns* 

X ~ 2c*' 


dW = ^j-dx 


, /2«r 2 \ (jv-l) / 2 f 

*(v) Jo 


x (JV-S)/2g-* dx = 1 

(jv-i)/2 








Some Useful Integrals and Functions 


41 


(c) Determine K so that K I (1 + z 2 )~ N < 2 dz =* 1. 


/: 


Solution. By the substitution z = tan 6 this becomes 2 K | cos w 6 dO where 
n = N — 2. From Exercise 9 below we find that 


-I." 


■■m 


whence 


5. Incomplete Beta and Gamma Functions. The integral 

(16) T*(n + 1) = f e~ x x n dx 

is called the incomplete Gamma function. Similarly 

(17) B*(m, n) — f x m_1 (l — a:)" -1 dx 

Jo 

is called the incomplete Beta function. Both (16) and (17) are 
useful functions in mathematical statistics and they have been 
tabulated by Karl Pearson and his staff at the Biometric Laboratory, 
University College, London. They are published by the Cambridge 
University Press. 

Exercises 

1. Show that the Gamma function becomes infinite when n — 0. Hint. From 
(2) you can obtain 

r(» + Jb) - (» + Jfc - 1) • • • (» + l)»r(n), 

that is 

T(n + k) 


r(B) =»(» + !) 

2 . Show that J* <t>(t) dt = 1 where <£(<) = 

3. Prove that r(i) = (ir) l/2 . 


(» + * - 1) 
— T e-*'*. 

(2ir) l/i 




42 


Mathematics of Statistics 


4 . Evaluate x; e -3*(i + x/2) 1 dx by transforming it into a Gamma function. 

Hint. Let cy = 1 + z/2 and determine c so that e~ 3x = ke~ v . 

Ans. (e«7 !)/3(6 7 ). 

5 . Evaluate j* e~ 2x (x - 6) 7 dx. Ans. e~ 12 2^7 !. 

6. Evaluate / e~ x x l/2 dx, given that / e~ x x~ l/2 dx — (ir) l/z . 

Jo Jo 

Hint. r(|) = |r(^). 

7. Find the difference and the ratio between the exact value of 10 ! and the 

approximate value obtained by using Stirling’s formula. 

8. Using (15) show that 



r /2 

cos” 1 6 dd = | B[(m + l)/2, ?]. Hint. Use (14). 
10 . Given that /(») = n l/2 B(n/2, §), show that lim /(n) = (2ir) l/2 . 



CHAPTER III 


GENERAL CONCEPT OF DISTRIBUTION FUNCTION OF A CONTINUOUS 
VARIABLE. GENERALIZED FREQUENCY CURVES 

1. Fundamental Notions and Definitions. The notion of distribu¬ 
tion functions relates to theoretical universes. The concept is an 
idealization of observed distributions comparable to the idealization 
of the outlines of material objects into the straight lines and circles 
of geometry. 

A continuous variable x is said to have the distribution function 
/(x), which we take to be single-valued and non-negative, if the 
frequency of occurrence of x in the range a < x < b is measured by 

( 1 ) f Kx) dx. 

If x has the distribution function /(x) with total frequency N, then 

( 2 ) J f(x) dx = N, 

and y = /(x) is called a theoretical frequency curve or, more briefly, 
a frequency curve. If the actual occurrence of the variable is limited 
to a finite range, f(x) is defined to be identically zero outside that 
range. If the total area under the curve is taken as unity, so that 

(3) f’f(x)dx = 1 , 

then y — f{x) is variously called the 'probability density , the proba¬ 
bility distribution, or the probability function of x. Then, /(x) dx 
gives, to within infinitesimals of order higher than that of dx, the 
probability that x lies in the interval (x, x + dx). Under condition 
(3), the integral (1) denotes the probability that x lies in the interval 
(a, 6). Under condition (2), (1) denotes the frequency of values in 
the interval (a, b). A distribution function can be regarded, there¬ 
fore, either as a frequency curve or as a probability curve according 
as condition (2) or (3) is imposed. The distinction can be adjusted 
by determining appropriately a constant factor in y = f(x). 

* 43 



44 


Mathematics of Statistics 


2. Moments. If a; is distributed in accord with the frequency 
curve y = f(x), with total frequency N, the moment of order k about 
the y -axis is defined by 

(4) *» = jjf’jMx) dx. 

In particular, for k = 1 we have the mean, v x = x , 

* x f(x) dx - 

If the mean is taken as the origin of measurement, so that 
J (X - *)/(*) dx = 0, 

then the moment of order k about the mean is defined by 

(6) \L k = i J (x- x) k f(x) dx. 

In particular, when k = 2 we have the variance, /x 2 = o -2 , 

= ±f_Jx-W(x)dx 

The y ’s can be expressed in terms of the v’s by the relation 

( 6 ) nk — v k — C(k, + C(k, 2)v k -2Vi 2 — • • • 

+ (-1 yC(k, r)v k _ rVl r + • • • + (-l)*-i[C(Jfc, k- 1) - 1 w 

where C(t ’ r) ~ ( jT -r) lr l ‘ 

In particular, the following relations are useful in computations, 

M2 = v 2 — Vl 2 

(7) M3 = v 3 — 3v 2 Vi + 2vi 3 

M4 — Vi — 4:VzVi + Qv 2 V\ 2 — 3vi 4 . 

The first of (7) is proved below and the others may be established 
in a similar way. 

M2 x2 f( x ) dx dx+ ]tff A*) dx 

— — f x 2 f(x ) dx — x 2 = v 2 — vi 2 . 

NJ- „ 




45 


Generalized Frequency Curves 

In standard units the moment of order k is defined by 


( 8 ) 



where 


(x - x) 


(9) 


h(t) = + x) = ^ Six). 


From (8) we have 

«0 = 1 ) 
«i = 0, 
«2 = 1 . 


Analogous definitions of moments could be given for probability- 
functions. When N = 1, in accordance with (3), the integrals in 
(4) and (5) are also called expected values. The language of expected 
values will be used in another chapter where we will be dealing more 
with probability functions. Before proceeding with the discussion 
of frequency curves, however, we will give an example of a proba¬ 
bility curve. 



Example. The Cauchy curve is a classical example of a probability distribu¬ 
tion although its use in present day statistics is relatively unimportant. Its 
equation is 


( 10 ) 


6 

ir(b 2 + x 2 ) 


^ X ^ 00 , 


The curve is symmetrical having its center at x — 0. 


b > 0. 



46 


Mathematics of Statistics 


A simple derivation of this function is as follows. For a given real constant 
b locate the point (0, b) as in the figure below. Let lines be drawn at random 
through (0, b) and let 0 be the variable angle between any such line and the 



contained between 0 and 0 + d0 is d0/tr. 
between d0 and dx to be 


negative direction of the y-axis; 
0 varies between the limits — tt/2 
and ir/2. The hypothesis is that 
all values of 0 in this range are 
equally likely. Denote the inter¬ 
cepts on the horizontal axis by x. 
Clearly, - oo < x < °o. The re¬ 
lation between 0 and x is 

— 0 ~ tan -1 y • 

b 

Under the hypothesis, the prob¬ 
ability that an angle Obx will be 
By differentiation we find the relation 


( 11 ) 


b dx 

7T (6 2 + x ?)' 


Therefore, the points of intersection of the lines with the z-axis are distributed 
so that the probability that a value of x will fall in the range dx is given by the 
right-hand member of (11). Hence the probability function for the variable x is 


fix) = 


b 

it (b 2 + x 2 ) 


and the probability that z lies in a finite interval (c, d) is given by 


Pic, d) 


-i: 


b dx 

rib 2 + x 2 ) ’ 


since the integral of the right-hand member of (11) from — oo to oo is equal to 
unity as can easily be verified. However, we cannot speak here of the mean 
value of z or of moments of higher order, since the integral 


x: 


x k dx 
ib 2 + x 2 ) 


has no meaning for k > 0. This restriction does not apply to probability func¬ 
tions in general. 


3. The Pearson System. There are two systems of generalized 
frequency curves in common use: the Pearson system and the 
Gram-Charlier system . 

During the years 1895-1916 Karl Pearson published papers in 
which he showed that a set of frequency curves could be obtained 



Generalized Frequency Curves 


47 


by assigning values to the parameters in a certain first order differ¬ 
ential equation. The Pearson school claims that all the different 
types of frequency distributions that arise in practical statistics 
can be represented by the solutions of this equation. 

With regard to the genesis of the Pearson system, one point of 
view is to regard it as empirical. Thus, starting with the differential 
equation 

Q2') ~ ~ , 

dt a -(- bt -f- ct 2 

it is observed that the solutions of (12) must satisfy certain geomet¬ 
rical properties of unimodal frequency distributions, namely, (a) the 
curve should vanish at the ends of the range, i.e., as y — »0, dy/dt —> 0; 
(6) when t = m, corresponding to a mode, dy/dt = 0. 

Among the solutions of (12) there are several types of curves, 
the shapes depending on the parameters a, &, c, and m. Examples 
of symmetrical, skewed, U-shaped and J-shaped curves with finite 
and infinite range in either or both directions, are shown in Figure 9. 



Fig. 9. Typical Curves of the Pearson System 


The parameters in (12) can be expressed in terms of the moments 
of the system. Multiplying (12) by t k dt and integrating over all 
admissible values of t, we have 

(13) (at* + bt*+ l + ct*+ 2 ) dt = J* y(mt* - t*+ l ) dt. 




48 


Mathematics of Statistics 


Integrating the left-hand side by parts, we obtain 

(14) t k (a + bt + ct 2 )y] — J y[akt k ~ l + b(k + 1 )t k + c(k + 2 )t k+1 ] dt 

— j ' y(mt k — t k+l ) dt. 

If yt k+ 2 vanishes at the ends of the range, then the first expression 
in (14) vanishes. If, in (12), y = h(t ) we have from (8) and (14), 

(15) maic + dkak-i + b(k + 1 )a/b + c(k + 2)cxjh-i = ock+i- 

Assigning k successively the values k = 0,1, 2, 3, we obtain from 
(15) the four equations 

m +6 =0 

CL + 3c = 1 

m + 36 + 4ca3 = a 3 

ma 3 + 3a + 46at3 4" 5c«4 = au 

from which the parameters can be determined. Solving (16) we 
obtain 

m = —[a 3 (3 + « 4 )] 
a = — [3a 3 z — 4a 4 ] 
b =^[-a,(3 + «0] 

c =i[6 + 3 as 2 -2 a J 
D = 18 + 12 a 3 2 - lOai. 

Carver* has expressed (17) in the more convenient form 




( 18 ) 


«3 ,_ «3 

2(1 4- 25) * “ 2(1 4- 25) 

2+5 _ 5 

2(1 + 25) ’ C ” 2(1 + 25) 

2«4 — 3a3 2 — 6 
“ «4 + 3 


See the Handbook of Mathematical Statistics — Rietz et al. 



Generalized Frequency Curves 


49 


Substitution of the above values into (15) yields an important 
recursion formula for the moments of the Pearson system: 


k 

(19) a*+i = 2 — (Jc — 2)5 + a2<Xk ^ 

For our purposes the most important curves in the Pearson system 
are the Type VII (normal curve) and Type III. These will now 
be discussed in some detail. 

Type VII. If a z = 0 = 5, then (12) becomes 



V 


which upon integration yields the so-called normal curve 
(20) y = Ce~ m , — oo < t < oo. 

The constant C may be determined so that the area under the curve 
is N. Imposing this condition and making use of (11a) of Chapter II 
we find that C = N/(2ir) 112 , and so (20) becomes 

N , s/ , 

y (2 t ) 1 ' 2 

It is conventional to write this in the form 


( 21 ) 

where 



4»(0 

t 


l 

(2ir y 12 


e -fih 


(x-x) 


We may call <f>(t) the normalized normal curve. 

Type III. If 5 = 0 but « 3 ^ 0 we see from (18) that (12) 
becomes 

dy ~(? + 'V 

dt x + | 3 ( 

which upon integration yields the Type III curve 
(22) V = K(A + t) A '-'e- At 



so 


Mathematics of Statistics 


where A = 2/a 3 , the range being (—A, qo). The criterion for a 
Type III curve is that 5 = 0. That is, if a Type III curve is to 
represent an observed distribution the observed moments should 
satisfy, at least approximately, the relation 

2 a 4 — 3a 3 2 — 6 = 0. 


Definitions of moments of an observed distribution are given in 
Part I. 

The constant K in (22) may be determined by the condition 
(23) j ydt = N. 


This integral can be evaluated by means of the Gamma func¬ 
tion. Let A 2 = n/ 2 and let A {A +0 = x 2 /2. Then we have 
(A + t) A2 ~ l = ( x 2/ 2 ) ( n/ 2) -^l-( n/ 2), g -At = e „/2g—X*/ 2j and fa = d( X *)/A. 

Making these substitutions in (23) we obtain 


KA~ nl2 e nl2 


and therefore 




= N 


N A nltp-nl2 



So with x 2 as the independent variable, (22) becomes 


(22a) 


y = 


n /*2y n /2)-i 



j-%2/2 . 


When N = 1, (22a) defines the 'probability distribution of y 2 . This 



mode is m = -a 3 /2(l - 25), 


is an important function which 
we shall use in subsequent dis¬ 
cussions. 

The designation “ Type III ” 
is usually restricted to the case 
for which A 2 ^ 1. When A 2 > 1, 
that is, when |a 3 | < 2, the curve is 
bell-shaped as shown in Figure 10. 

In the Pearson system, the 
distance between the mean and 
and is a measure of skewness. 



Generalized Frequency Curves 51 


Under the conditions imposed for Type VII, m = 0. For Type III, 
however, m — —a 3 /2 and therefore we have 


(mean — mode) 


w. 

2 


Because of this relation, |a 3 |/2 is sometimes used as a measure of 
skewness in observed distributions. The curve for « 3 = —k (k = 
a constant) is a reflection of that for a 3 = k through the line t ~ 0. 

When A 2 < 1 , that is, when |« 3 | > 2, the curve is J-shaped with 
an infinite ordinate at t = —A. 

The special case for which A 2 — 1 is known in the Pearson system 
as Type X. When a 3 = ±2, (22) becomes 


y = Ke±K 

This is also known as Laplace’s second frequency curve. 

Tables of ordinates and areas of the Type III curve have been pub¬ 
lished by Salvosa in the Annals of Mathematical Statistics , vol. 1, no. 2. 

A systematic treatment of all the curves in the Pearson system 
has recently been given in a paper entitled A New Exposition and 
Chart for the Pearson System of Frequency Curves by C. C. Craig, 
Annals of Mathematical Statistics, vol. 7, no. 1, pp. 16-28. 

4. Genesis of the Pearson Curves in the Theory of Probability. 
The differential equation (12) is supposed to have some support 
in the theory of probability. This claim rests on the assumption 
that the distribution of statistical material may be likened to a priori 
distributions in certain urn schemata. The method by which (12) 
is associated with underlying probabilities is started by considering 
the following problem. 

An urn contains n balls of which np are white, so that the proba¬ 
bility of drawing a white ball in a single trial is p. The rest of the 
balls, nq, are black, and the probability of failure to draw a white 
ball in a single trial is q = 1 - p. If s balls are drawn from the 
urn one at a time with replacements after each draw, what is the 
probability, B(x), of drawing exactly x white balls and (s — x) 
black balls? 

From the Bernoulli theory it is known that the probabilities of 
getting x = 0, 1, 2, • • •, s, successes in s trials are given by the suc¬ 
cessive terms of the binomial 

(24) (q + pY = ±B(x) 

x =0 



52 


Mathematics of Statistics 


where 


B(x) = C(s, x)p x q a ~ x . 


Representing the terms B(x ) by ordinates y x , one may plot the 
(s -+* 1) points (x, y x ). Through these (s -+- 1) points one may 
imagine a curve that can be represented by an analytic function. 
Since 

y x = C(s, x)p x q a ~ x 

and hence 

2/*h-i = C(s, x + 1 )p x q a ~ x ~ l , 

we have 


(25) 

From (25) we obtain 


y x + 1 = sp - px 
Vx qx + q 


(26) y*»zv x = —± -?rx _ 

Vx +1 + y x sp + q + (q — p)x 

Now the mean of any two ordinates (y x and y x +i) may be considered 
as approximately equal to the ordinate {y x+ 1 / 2 ) midway between 
them. The slope of the line joining any two points (x, y x ) and 
(x + 1, y x +\) is also approximately equal to the slope of the tangent 
at the point midway between these two points on the continuous 
curve. Under these two assumptions, (26) may be written as 

( 27 ) Pxy x +i /2 = 2 (sp - q - x) * 

y*+U 2 sp + q + (q — p)x 


The right member of this equation is, therefore, the derivative 
of log y at the point (x + §, y x+ M^). At any point ( x , y x ) this deriva¬ 
tive is 


<*(log y) = 2{gp - q - (x - %)\ t 
dx sp + q + (q - p)(x - §) 


If P — q ~ h then (28) becomes 




Generalized Frequency Curves 


53 


which is of the form 
dy 
V 


(x — m) dx 


a > 0, 


and which, upon integration, yields the normal curve 
(29) y = ke~ p 

(x — m) 2 
2 a 


where 


P = - 


The next step consists in dealing with the case From (28) 

we have 

d q ~~2 i " + ~ sp ^ 

dx ^° g ^ ~ 1 (x - sp)(q - p) 

m + 2 +~ - 


If we set 


(spq) 112 

the above equation becomes 


«3 = t —777 ; and t = 


2 

‘'x — sp 
(spq) 112 


(30) 


7 ( (log y) 


~ + t 


1 +T' + r~ 

2 4:spq 


If spq is so large that 1/4 spq is negligibly small, (30) becomes 


(31) 


d(log y) 
dt 


1 + i ( 


which upon integration yields the Type III curve. It is evident 
from (31) that this curve approaches the normal curve as a limit as 
<*3 —> 0 . 

With p = q, (28) is of the form (12) when b = c = 0. With 
p 7 * q, (28) is of the form (12) when c = 0. To produce, in the 
theory of probability, an expression comparable to (12) when both 
b and c are different from zero it is necessary to consider a more 



54 


Mathematics of Statistics 


general urn problem. So far the underlying probability, p, has 
been constant. If we consider the urn schemata previously described, 
but remove the restriction of replacements, then the chance of success 
is not constant from trial to trial but depends upon the results of 
previous trials. Thus, without replacements, the chances of obtain¬ 
ing exactly x = 0, 1, 2, • • •, s white balls in a draw of s balls, are 
given by the successive terms of the hypergeometric series 

(32) — 1 - {C(np, 0)C(nq, s ) + C(np, 1 )C(nq, s - 1) + 

G(n, s) 

• • • + C(np, x)C(nq , 8 — *) + ••• + C(np, s)C(nq, 0)} 


in which the general term is 

( np ) ! (nq) ! s ! (n — s) ! 


H(x) = 


(np — x) ! (nq — s + x) ! n ! x ! (s — x) ! 


By representing the terms of this series as ordinates of a frequency 
polygon, it is possible to show that* the slope, at the mid-point of 
any side, divided by the ordinate at that point is equal to a fraction 
whose numerator is a linear function of x and whose denominator 
is a quadratic function of x. It is clear that (12) gives a general 
statement of this property. 

Since the hypergeometric series is associated with (12) and the 
Bernoulli series is associated with a special case of (12), viz., when 
c = 0, we should quite naturally expect that the Bernoulli series 
is a special case of the hypergeometric series. Writing H(x ) in the 
form 


H(x) ■ 


p\p - \/n\ • 


x ! (s — x) !' 
• [p - (x + 1 )/n)q\q - \/n) ■ 


{q- (s- 


■ 1 )/»} 


{1 - l/n) • • • {1 - (x + l)/n}{l - x/n\ • • • {1 - (s - 1 )/nj 


it is obvious that 

Lim H(x ) = C(s, x)p*q a ~ x = B(x). 


When n = °o, there is an infinite supply in the urn, so the proba¬ 
bility, p, remains constant from trial to trial without replacements. 
In other words, sampling from a finite supply with replacements is 
the same as sampling from an infinite supply without replacements. 

* See 1. Elderton, Frequency Curves and Correlation. 

2. Rietz, Mathematical Statistics (Carus Monograph), Chapter III. 



Generalized Frequency Curves 


55 


5. Further Discussion of the Normal Curve. We will now return 
to a discussion of the normal curve, giving some proofs which had to 
be omitted in Part I, and supplying explanations which in one 
instance or another perhaps had to be read between the lines there. 

A. Fitting the Curve. If (29) is to represent an observed distri¬ 
bution, the parameters m, a, and k may be determined by the principle 
of moments. Equating the fcth functional moment to the Arth 
moment of observed data, for A; = 0,1, 2, we have three simultaneous 
equations 


(33) 


k f. 

k i: 


e -( x - m )2 l2a dx = N 


g-( x-m)*l2a X dx = Nx 


e -(*-m)*/2a x 2 dx = N V 2 


in which the parameters are the unknowns. 

The solution of these equations can be made to depend upon the 
integral 

(34) J e-^ dy = (lira)" 2 


which is evaluated in Chapter II. Using this result and letting 
y = x — m, the first of equations (33) becomes 
(a) k(2ira) 1/2 = N. 

The second becomes 


*/: 


e _1/2/2 ° y dy + km J* e~^ l2a dy = Nx. 


In the above relation, the first integral vanishes because the inte¬ 
grand is an odd function. So, using (34), we have 

(6) km(2ira) 112 = Nx. 


The third integral in (33) may be written in the form 

kJ* e~^ l2a y 2 dy + 2 km J* e~^ l2a y dy -f km 2 J* e~ v2/2a c 


Upon integrating (by parts) the first integral in the above expression 
and evaluating the other integrals, we obtain 

(c) ky/2ira(m 2 + a) = Nv 2 . 



56 Mathematics of Statistics 


From (a) and (b) we find m = x. From (a) and (c) we have 
ra 2 + a = V 2 , 

and so 

a — nz — c 2 . 

Therefore, (29) becomes 

N 


B. Moments. The general moment of odd order of (35) about 


(35) 

B. 

the mean is given by 


- bf-. y(x ' s) 


2fc + 1 dx. 


But the right member vanishes because the integrand is an odd 
function. Therefore, all moments of odd order of the normal curve 
taken about the mean are zero. 

The general moment of even order is 

nth = ^ J y(? ~ x) 2k dx. 

Integrating the right member by parts, letting u = (x — x) 2 * -1 , the 
following recursion relation is obtained for even moments 

(36) = (2k - ly 2 *^. 

Then when fc = 1, /x 2 = a -2 ; when k = 2, fH — 3/x 2 2 ; etc. 

A recursion formula for the moments in standard units may also 
be obtained from (19). Under the conditions imposed for Type VII, 
(19) becomes 

oik+i = kak-i, k = 1, 3, 5, • • • . 

Hence, 

<X2 = 1 

oti = 3 

a 6 = 1-3-5 

Q!2A; = 1 * 3 • 5 • • • 

_ m i 

2 k k ! ’ 


(2k - 1) 



Generalized Frequency Curves 57 

C. Quadrature. Some writers use the term quadrature for the 
evaluation of an integral. The definite integral 



is commonly called the probability integral. Clearly it is a function 
of the variable limit t. Although (37) cannot be evaluated in finite 
form, it can be computed by expanding the integrand into a power 
series and integrating as many terms as may be needed. 

In (37) let y = hx. When x = t, y = ht. So (37) becomes 

(38) 4>(<) = 4 = dy- 

VW o 

Expanding the integrand of (38) we have 


er** = 1 - «2 


i J'l _ _i_ 

2 ! 3 ! 


^Tt + ' 


(« - 1 ) ! 


Termwise integration yields the result 


(39) 


1 nv 

vWo 6 


2 dy ■ 


1 

VtF 


y 3 , y 6 y 1 . y* 

3 “ 1 " 10 42 ^ 216 


,R 


y n 

1320 ’ 


This series converges for all values of y, and the error made in stopping 
at any term is numerically less than the first term neglected. For 
small values of y it converges rapidly and is a satisfactory method for 
computing when y < 1. 

But for large values of y, (39) converges too slowly to be practical; 
too many terms are required. It is therefore important to obtain 
an expansion in descending powers of y. To this end write 


(40) 


PW dy = f 

«/o «/o 


Vtt 

2 



er* dy 


X e~** dy = f - ye~y* dy. 

Jy y 


and 



58 


Mathematics of Statistics 


Integrating the last integral by parts we obtain 


2 y 


I-IT 


dy • 


Integrating successively by parts gives the result 


(41) 




8 V 6 ^ J 


From (40) and (41) we have the final result 

(42) JL r^.^_ JS _^Lfi__L + A_3L5+ 

VWo 2yVTT 1 2y 2 4t/ 4 8y 6 ^ 

• • * + (— l^Tn+i + • 

1-3-5 - • • (2n - 1) 


where 


7Vi = 


2 n y2n 


and (n + 1) is the number of the term. The series in (42) is called 
an asymptotic or semi-convergent series because it converges until a 
minimum term is reached and then diverges. The general term 
T n + 1 decreases so long as n < y 2 . But after the integrations by 
parts have been performed so many times that n > y 2 , Tn+i increases. 
Of course the integrations should not be carried further. The value 
obtained by using the series in (42) will differ from the true value by 
less than the last term retained. 

Tables of (37) may be computed by means of (39) for y(= ht ) <1 
and by (42) for y > 1. Such tables were computed long ago and 
are available in many places. 


Example. Evaluate (37) for t = 3 and check the result with the value of 

J ' 4>(t) dt for t = 3 given in the tables in the Appendix. 

0 

Solution. Since y = t/y /2 we are to evaluate (42) for y = 3/V2. Substitut¬ 
ing this value in (42) we have 


e~» /2 V2L _1 _3 15 105 1 
2Vx 3 [ 9 + 81 729 + 6561J 

= .5 -|*(3). (.9213) 

= .5 - .00136 = .49864. 


The value given in the tables is .49865. 




Generalized Frequency Curves 


59 


6. The Gram-Charlier Series. If a function fix ) gives only a 
rough approximation to a frequency distribution, a more accurate rep¬ 
resentation may be obtained by using the first few terms of the series 

(43) F(x) = c 0 f(x) + cjv(x) + + • • • + c4^(x) + • • • 

where fix), called the “ generating function/’ gives a first approxi¬ 
mation to the given distribution, and f (n) (x) is the nth derivative 
of fix) with respect to x. 

It should be observed that series representation is also involved in 
the Pearson system. For, suppose the differential equation under¬ 
lining that system is written in the form 
dy = yja — x) _ 
dx f(x) 

Then if it be assumed that f(x) is expressible as a power series which 
is so rapidly convergent that the first few terms are sufficient, we have 
the form given in (12). In the Pearson system the series occurs in 
the differential equation of the function whereas in the Gram-Charlier 
system it occurs in the function itself. 

If in (43) the normal curve is taken as the generating function 
then F(x) is known as the Gram-Charlier Type A series. In dis¬ 
cussing this series no essential loss of generality is suffered by using 
standard units. Thus we may write 

(44) F(t) = Co *(0 + C!0(«(O + c^\t) + • • ■ + c n <l>^it) + • • • 
where <£(*) is defined in (21). The moments of Fit) are defined by 

(45) a n = J Fit)t n dt 

and it follows that a 0 = 1, <*i = 0, a 2 = 1. 

The coefficients c n in (44) may be expressed in terms of the moments 
a n , because the functions </> (n) (0 and the Hermite polynomials H m {t) 
defined by the relation 

(46) *<">(0 = (-l)-ff.(0*(O 
form a biorthogonal system. That is 


(47) 

J <f>^it)H m it) dt = 0 

for m ?£ n, 

(48) 

J ” <F*\t)H m it) dt = (—l) n w ! 

for m = n. 



60 


Mathematics of Statistics 


Proofs of (47) and (48) are available in the literature* and will be 
omitted here. The recursion relation 

(49) iWO = - nH n _i(t) 

can be established.! By differentiating we find from (46) that 
Hi = t and since H 0 = 1 we can use (49) for n > 1. 

To make use of the biorthogonal property noted above we multiply 
both members of (44) by H n (t) and integrating, under the assumption 
that the series is uniformly convergent, we obtain 

(5°) f’ F(t)H.(t) dt = *<■>(()#„(() dt = c„( —l)"n I 

since all terms of the right member vanish except the one with the 
coefficient c„. Hence from (50) we have 

(51) c„ = rFm.it) dt. 

n \ J _ co 

From (51), (49), and (45) we obtain the following results: 

Co =J F(f) dt = 1 

ci = j* F(t)t dt = 0 

C = i J F(f)(JP -l)dt = 0 

C = f}//(*>(-«• + 3f) dt= - ~ 

C = + 3) dt = 

We have, therefore, 

(52) F(t) = *«) - jM 3) W + + • • • 

and F (*) = -F(t). 

<T 

The values of of its integral, and of its second to eighth deriva¬ 
tive, are given to five places of decimals in Glover’s Tables. 

* See Rietz, Mathematical Statistics, pp. 165-168. 
t See Levy and Roth, Elements of Probability. Oxford. 1936. 


Generalized Frequency Curves 


61 


Exercises 


1. Prove that the points of inflection of the normal curve are equidistant from 

the mode. What are the coordinates of these points? 

2. If x has the distribution function y = fix), with total frequency 1, the mean 

deviation, M, about the value v is defined by 

M = J | x — v \f(x) dx. 

Prove that M is a minimum when v is the median, that is, when the ordi¬ 
nate at x = v bisects the area under y = fix). 

Solution. We may write the expression for M in the form 

(x — v)f(x) dx. 


-s: 


(v - x)f(x) dx + 


/ 


It is shown in treatises on advanced calculus that if 

me) = J b 0 ) dx, 

6 being a parameter and a and b being functions of 6, then 

Therefore, differentiating M with respect to v and equating the result to 


zero, we have 


J* f(x) dx — J* f(x) dx = 0 . 


So M is a minimum when J* f(x) dx = J* 


f{x) dx, that is, when the 


partial areas to the right and left of v are equal. (It is left to the student 
to show that M is actually a minimum when dM/dv = 0.) 

3. Prove that the relation between the mean deviation (about the mean) and 
the standard deviation of the normal curve (in arbitrary units) is 


M = (2/7r) 1/2 <r = .798v, approximately. 

Hint. By definition, 

"-if.'. y | x — x | dx = a- J 4>(t) 1 11 di = 2 <f>(t)t di. 

4. Suppose x is distributed in accord with the frequency curve y = Ce~ x,a , 

0 < x oo , a being a positive constant and C being determined by the 
condition that the area under the curve is N. Evaluate vu successively 
for k = 1, 2, 3, 4. Then find n k for k = 2, 3, 4, and finally obtain the 
values x = a, a — a, a 3 = 2, a 4 = 9. 

5. Given fix) — Cx n ~ l e~ x , 0 ^ x <. oo, where C is determined by the condition 

that the area under the curve is unity. Evaluate vu for k = 1 to 4, nk for 
k = 2 to 4, and a* for k = 3, 4. Show that a 3 and a 4 satisfy the criterion 
2a 4 - 3a , 2 -6 = 0. 



62 


Mathematics of Statistics 


6 . State the differential equation underlying the Pearson system of frequency 

curves and derive the equation of the normal curve as a special solution 
of this equation. Evaluate the constant of integration so that the area 
under the curve is unity. 

7. Discuss the Type III curve. 

8. Show that y in (22) vanishes when t = — A and t = ». 

9. Read Chapter III of the Car us Monograph on Mathematical Statistics by 

H. L. Rietz. 

10. Explain how the probability integral (37) may be evaluated for, (a) small 

values of t, (6) large values of t. 

11. Evaluate (37) for (a) t = y/2/2, (6) t = 2y/2. 

12. Consult the reference cited for the proofs of (47) and (48) and give a report 

on them. 

13. By successive differentiation of <f>(t) evaluate H m (t) from (46) for m — 1, 2, 

3, 4. Check your results with (49) for n — 1, 2, 3. 

14. Making use of the biorthogonal property of Hermite polynomials and deriva¬ 

tives of the normal curve, derive the values of c„, n = 0 to 4, in the Type A 
series. 

16. Taking t = 0, ±1, ±2, ±3, plot (52) on the same axes when (a) a 3 = 0 and 
a* = 3, (6) a* = —1.2 and a 4 = 3, (c) a 3 = —1.2 and at 4 = 4.2. In ( b) if 
a* = 1.2, what effect would this have on the curve? 




CHAPTER IV 


JOINT DISTRIBUTIONS OF TWO VARIABLES. THE NORMAL 
CORRELATION SURFACE 

1. Fundamental Notions. Definitions of a frequency function of 
one variable and the associated notion of probability were given in 
Chapter III. Corresponding definitions will now be given for an 
arbitrary probability distribution of two variables. The continuous 
variables (x, y ) have the joint probability function fix, y ) if the 
double integral of fix, y) over a region of the {x, y)-plane measures 
the relative frequency of occurrence of pairs of values {x, y) in that 
region. It will be understood that fix, y) is continuous, single¬ 
valued, and non-negative. If values of (x, y) are restricted to a 
finite region we define fix, y) to be identically zero outside that re¬ 
gion. In the extended region of definition, we have 

(1) j J fix, y)dydx = 1. 

Geometrically, this means that the volume under the surface rep¬ 
resented by z = fix, y) is unity. Then/(x, y) dy dx is the probability 
that simultaneously x lies in the interval \x, x + dx) and y lies in 
the interval iy, y + dy). Consequently, 

(2) fj> , y) dy dx 

represents the probability that x lies between a and b at the same 
time that y lies between c and d. 

We shall distinguish between two cases: (a) when the variables 
are independent in the probability sense, and (6) when they are 
correlated. Let the probability be gix) dx that x occurs in dx for 
all y’s. Then integrating over all admissible values of y, we have 

(3) gix) dx = dxj fix, y) dy. 

It is clear that the integral in (3) gives gix) because the relative 
frequency of occurrence of £ in any interval (a, b) is the relative 
63 



64 


Mathematics of Statistics 


frequency of pairs (x, y) belonging to the strip of the xy- plane for 
which a < x < b, and this is 

fix, y) dy dx = la dx. 

Similarly, if h{y) dy is the probability that y occurs in dy for all 
assignments of x, we have 

(4) h(y) dy = dy J fix, y) dx. 

In accordance with convention we shall call gix) and hiy) the marginal 
distributions. 

The independence of x and y is characterized by the following 
Definition. The variables x and y are independent when fix, y) 
= gix)hiy). If fix, y) cannot be expressed identically as the product 
of the marginal distributions, then x and y are said to be correlated. 

2. Moments. Let the general product moment about the com¬ 
mon origin of x and y be defined as follows: 

(5) v mn =J* J fix, y)x m y n dy dx. 

If m — 0 and n = 1 , we have 



(6) Voi = li: fix, y)y dy dx. 

Let fix, y) be a function in which the order of integration may be 
interchanged. Then v n becomes 

fix, y) dxj y dy =J hiy)y dy, 

which is the mean, y, of the y’s. Similarly, the mean of the x’s is 

(7) fix, y)x dy dx — J* gix)x dx. 

We will now define the general product moment about the means 
(x, y) as follows: 

flmn= S- a S- ~ ~ ^ n f( x > dy dx - 


( 8 ) 




65 


Joint Distributions of Two Variables 


When in = n = 1, we have 

(9) mu = f f ( x - - y)f( x > y) d y dx > 

which is styled the co-variance of the joint distribution. 

When m = 2 and n = 0, we have the variance of x, 

(10) M2o = J J - x) 2 f( x > y) d y dx 

= J (x — x) 2 g(x) dx 

= <7* 2 . 

Similarly, when m = 0 and n = 2, we have the variance of y, 

(11) M02 = f f (y - y) 2 K x > y) d v dx 

=J (y - y) 2 h(y ) dy 


=. <*»• 


It is left as an exercise for the student to show that 


( 12 ) 


( Mu = v n ~ viom, 
M20 = V20 ~ no 2 - 


The coefficient 
is defined by 

(13) 


of correlation between x and y, denoted by p X y, 


9xy — 


P-11 

<T X <T V 


3. Regression. 

function fix, y), 
interval is 


If y has been assigned in the joint probability 
the probability that x will lie in an infinitesimal 


fix, y) 

m 


dx. 


Thus, when y is fixed, 


/. 


» h(y) 


1, 


and so fix , y)/hiy) is the probability function of x for a fixed y. 
It may be called the probability density representing a y array of x’s. 




66 


Mathematics of Statistics 

Likewise, if we fix x the probability density for an x array of y’l 
is given by f(x, y)/g(x), since 


when x is fixed. 


r°°Kx,y) . 


The notion of arrays may be made more concrete by thinking 
of a joint distribution of the heights and weights of men. If x refers 
to weight and y to height, then an example of an x array of y’s is the 
distribution of the heights of all men who weigh 150 pounds, and the 
weights of all men who are six feet tall is an example of a y array of x’a. 

The mean of an x array of y’s is 


( 14 ) Vx = f«l±ll dy 

J 0(x) 

where the integration is performed over all values in the array 
defined by x. Similarly, the mean of a y array of x’s is 

(IB) J. = f^’^dx 

J h(y) 

integrated over all x’s in an array for a fixed y. 

The variance in an x array of y’s is given by 

<“) /(r-W^U 

J 9(x) 

integrated over all values in the array fixed by x. Similarly, the 
variance in a y array of x’s is 

( 17 ) f( x -3,)t(^dx 

J Hy) 

integrated over all values in the array 
fixed by y. 

Taking different x arrays of y’s 
fixes the mean points y x and as x 
varies continuously we get the locus 
of these means which is called the 
regression curve of y on x. Its equa¬ 
tion is given by (14) where now, of 
course, z is a variable. Similarly, 
(15) gives the regression curve of x 
on y. Of particular interest and use are the cases in which these 





Joint Distributions of Two Variables 


67 


regression curves are straight lines. If the equation of the regres¬ 
sion curve of y on x is of the form 


y x = Ax + B, 

then the regression of y on x is said to be linear. Similarly, if the 
equation of the regression curve of x on y is of the form 


x y = Cy + D, 


then the regression of x on y is said to be linear. If one regression 
system is linear the other is not necessarily linear. 

Let us now consider the implications of linear regression on the 
joint probability function f(x, y) and the marginal totals g(x) and 
h(y). Consider 

>-iy 

or 

(18) J yf(x, y) dy = Axg{x ) + Bg(x). 

Integrating each side of (18) with respect to x, and remembering 
that we may interchange the order of integration, we have 


S \_S y ^ x * y ^ dy \ dx=A f xg ^ dx + B S 


g(x) dx, 


(19) 


voi — Av io + B. 


Multiplying each side of (18) by x and integrating with respect to x, 
we have 


s:u: yf(x, y) dy^dx = aJ x 2 g(x) dx + bJ xg(x) 


dx. 


Since the left member is 



xyf(x, y) dy dx = v n , 


we have 

( 20 ) 


= Av 2 o + Bv 10 . 



68 


Mathematics of Statistics 


A simultaneous solution of (19) and (20) yields 
, *>11 — yioVoi 


Vw — Vio‘ 
B = V 01 — Vio — 


Mu _ 
P20 


= y — x — p. 

Ox 

Therefore the equation of the line of regression of y on x becomes 


( 21 ) 


v*-y = *^ (*-*)• 


In an analogous manner, if the regression of x on y is linear the 
regression line has the equation 


( 22 ) 




The quantities A = p(o v /<r x ) and C = p(o x /<r y ) are called the 
regression coefficients. It is obvious that their product is p 2 . 



f(x, y) = —» 

a 2 


surface represented by the given function is unity. Thus 


Example. Given 

2 0 ^ x ^ y, 

0 ^ y ^ a, 
as the joint probability function of two 
variables x and y. Find (i) the margi¬ 
nal totals g(x) and h(y); (ii) the mean 
and variance of each of the marginal 
totals, i.e., pi 0 and a x 2 = mo for g(x), i» 01 
and <r y 2 — y, 02 for h(y); (iii) the equa¬ 
tions of the regression curves of y on x 
and of x on y, y x and x u ; (iv) the 
correlation coefficient p. 

Solutions. The volume under the 


£££ zdy =l£ y 


dy = 1, 


The surface is shown above. 

(i) The marginal totals are 

g(x) 

Ky) 


X a 2 2 

Jo a 2 a 2 




Joint Distributions of Two Variables 


69 


(ii) The means are 


J r*“ 2 

x— (a — x ) <k 
0 a 2 

. C a 2y 2a 
Jo V a> dy -J- 

•-£ x '7’ (a - x)dx 


the variances are 


a 2 4a 2 a 2 
Moi = <r v 2 = — — — = — - 


(iii) The regression lines are 

C a 2/a 2 , a+x 

y ‘-J, y 2(a-x)/a' dy - — 

P x TT\ ix ~l' 

Jo 2 y/a* 2 

(iv) From the equations of the regression lines it follows that p 2 = } and 
p = i since p(a y /<r x ) is positive. 

4. The Standard Error of Estimate. We have seen that the 
probability density in an x array of y’s is f(x, y)/g(x). Then the 
variance s v .* 2 within such an array is 




The mean, over all x arrays, of values of s„.* 2 weighted with the 
marginal distribution of x is denoted by S v 2 , and S u is called the 
standard error of estimate , We will now show that S v 2 = <r„ 2 (l — p 2 ). 
By definition, 


! = j* g{x)s v , 2 dx 
=/ / (.y - y*Yf(x, y) dy dx. 




70 Mathematics of Statistics 

Using the value of y x given in (21) the above expression becomes 

S v 2 =J' J | 2/ — y - p~(x - x)j f(x, y) dy dx 

=f f {(2/ - vY - ‘ip’fiy - y){x - x) + 

p 2 ( x ~ *) 2 } /(*» y ) d y dx, 

and the right member simplifies so that we have the result 
Sy 2 ~ °V 2 (1 — P 2 ). 

From this result it follows that 

-1 < p < 1. 

5. The Normal Correlation Surface. We shall now consider a 
joint probability function of special interest. The normal correla¬ 
tion surface is defined by the following function 

(23) f(x , y) = Ke~ p , 

where 

p 1 f * 2 2p xy y 2 j 

2(1 — P 2 ) 1 O' 2 ov T y ^v*\' 

^ = 2w I cr„(l - p 2)1/2, 

— oo < x ^ oo, — co < y < oo ? 


and the variables x and y have the origin of their reference system at their 
respective means, that is, 


(24) 


x 

y 


= xg{x) dx = 0, 
=J yh(y ) dy = 0. 


These conditions (24) may be imposed without essential loss of 
generality and will simplify the algebraic discussion. 




71 


Joint Distributions of Two Variables 

The marginal distribution of x is given by 

g (*) = J fix, y) dy 

= Ke~ x2t2a2 /: 

~ Re-*? 12 ** 2 j* e~ v ' 2l2{l -p 2) (Ty dy' 

= Ke-^^{2ir(l - p 2 )Y l2 <r y . 

(25) = 

ffjV 2 tt 

Similarly, the marginal distribution of y is 



Hence we may state 

Theorem I. If two variables are normally correlated, each variable 
is normally distributed in its marginal totals. 

That the converse is not necessarily true is shown by the following 
illustration. Consider a clay model of a normal 
correlation surface such that its marginal totals 
are necessarily normal distributions by the above 
theorem. Quantities of the clay can be redis¬ 
tributed by piling up in certain spots the clay 
that is scooped out in other spots in such a way 
that the marginal totals are not disturbed. It 
is obvious that the resulting surface is not one that is defined 
by (23). 

Other interesting properties of normally correlated variables are 
described by the following theorems. 

Theorem II. The regression systems of a normal correlation surface 
are linear. 

The proof is a matter of integration. Let us find the probabil¬ 
ity function of an x array of y’s. By definition, this is given by 
fix, y)/g(x). To get the mean of such an array we must multiply 




72 


Mathematics of Statistics 


its probability distribution by y and integrate over all values of 
y in the array. Thus we have 


= C “ yf(x, y) dy 
g(x) 

J_„<r„{27r(l - p 2 )} 1 / 2 y(ty 

_ Xpffy 


In the exercises at the 
verify the above result, 
evident that the locus of 


end of the chapter the student is asked 
If x is allowed to vary over the arrays, it 
the means of the x arrays of y’s is the line 


( 27 ) 





In a similar way the mean of ay array of x’s is given by 


r “ xf(x, y) dx 

Ky) 


V<rxp 

<Ty 


and this lies on the regression line 


( 28 ) x y =^. 

Oy 

While it is an intrinsic property of a normal correlation surface 
that both regressions are linear, one should not infer that this is 
characteristic of joint probability functions in general. One or both 
or neither of the regression systems of an arbitrary distribution 
function may be linear. The student will observe that the definition 
of the correlation coefficient did not involve the condition that 
fix, y) was normal nor that regression was linear. Although the 
definition of a correlation coefficient does not require linear regression, 
nevertheless the correlation coefficient may fail to measure the 
correlation in the case of appreciable non-linear regression. 

Theorem in. If x and y are normally correlated, then each array is 
a normal distribution with constant variance S v 2 from one array of y’s 
to another and constant variance S x 2 from one array of x’s to another. 


S3* o 




73 


Joint Distributions of Two Variables 


The proof consists in exhibiting the distribution function for an 
x array of y’s and for a y array of z’s. Thus, for the first case we have 

(29) Ml —_— g-ii/-p*( <r j// ff *)i a /2 s v 8 

g{x) V2 tS v 

where S u * = <r y \\ — p 2 ). Evidently, this is a normal distribution 
with variance S v 2 which is independent of x and therefore is constant 
over all x arrays. It is left as an exercise for the student to give the 
companion proof for the arrays in the y direction. 

When the variance is constant over the arrays in the x direction 
the regression system of y on x is said to be homoscedastic (equally 
scattered). Similarly for the y direction. A geometrical represen¬ 
tation of a normal correlation surface is given in Part I, § 18 of 
Chapter VIII. 

6. Limi ting Forms. Suppose a plane is passed through the surface 
defined by (23) parallel to the xy- plane. Analytically, this means 
that we let f{x, y) — c where c is some constant less than the maximum 
value of the function, that is, we take 0 < c < K to insure a real 
intersection. We obtain 


(30) 

where 

(30a) 


£i _ . yL = X 2 

U x 2 <r X <Jy <Ty 2 ’ 


X 2 - 2(1 - p 2 ) log® 

c 


which is obviously not, negative. Thus the points ( x , y) for which 
the probability density is constant lie on an ellipse. 

It is easier to study (30) if we transform the variables to standard 
units by letting t x = xja x and t y = y/<r v . Then (30) becomes 

(31) t* - 2 pt x t y + t y * = X 2 . 

The cross-product term will vanish under the transformations 
t x = u cos 6 — v sin 6 
t v = u sin 6 + v cos 0 

when 0 = 7r/4. So the required rotation formulas are 


u — v 

(2 ) 1/2 


U + V 

W»' 


(32) 


t 3 


and t t 




74 


Mathematics of Statistics 


Applying these to (31) we obtain 

(33) ti*(l - p) + v\l + p) = \2 

which may be written in the standard form 


(34) 

where 



1 


a 2 


X 2 

1 - p 


and 


6 2 = 


X 2 

1 + P 


The eccentricity of the ellipse (34) is (1 — 6 2 /a 2 ) 1/2 = [2p/(l + p)] 1 / 2 . 
We see that b -» a as p —> 0. When p = 0, b = a = \. Then (34) 
would be a circle, and (23) would be a surface of revolution if the 
variables were expressed in standard units. When p = 1, it follows 
from (33) and (30a) that v = 0. From (32) it is seen that the line 
v — 0 is the same as t y = t x , and the ellipse has degenerated into a 
straight line. The surface then shrinks into a normal curve in the 
plane t v = t x . 

7. Tetrachoric Correlation. The word tetrachoric refers to a 
2X2 fold table. Suppose N objects are classified according as 
they possess one or both or neither of two qualitative traits or attri¬ 
butes which may, for convenience, be denoted by I and II. Such 
a classification will yield a four fold table as shown in Table 4, 


Table 4 



Not II 

II 

Total 

Not I 

a 

b 

a + b 

I 

c 

d 

c -J- d 

Total 

a + c 

b + d 

N 


where a~{-b-\-c-\-d = N, the four classes being mutually exclusive 
but not necessarily exhaustive. The attributes may sometimes 
admit also of quantitative measurement but we are considering only 
the case where they are classified in dichotomy, such as “ tall ” and 
“ not tall,” “ male ” and “ female,” “ alive ” and “ dead,” “ good ” 
and “ bad,” “ dull ” and “ not dull,” etc. An example is the follow- 




75 


Joint Distributions of Two Variables 

ing classification of 26,287 children where attribute I is dullness and 
attribute II is developmental defects. 

Table 5. (K. Pearson, Tables , p. li) 



Without 

With 

Totals 


Defects 

Defects 

Not Dull 

22,793 

1,140 

24,213 

Dull 

1,186 

8S8 

2,074 

Totals 

23,979 

2,308 

26,287 


The problem in such classifications is to measure the intensity 
of association between the two attributes in the set. Let us suppose 
that our data had been given initially so that a fine division into 
many cells was possible and that the result would have presented 
a normal correlation surface. If this surface were then divided into 
four cells by planes x = h and y = k to yield the relative frequencies 
observed, then the correlation coefficient that characterizes this 
normal correlation surface is called tetrachoric r. It will be denoted 
by r t . 

K. Pearson has given a method and tables for determining r*. 
(Cf. Tables for Statisticians and Biometricians, Part I.) The pro¬ 
cedure may be indicated by the following diagram and skeleton 
solution for our example, Table 5. (The details will be found in the 
reference cited.) 

Solution of Example. (See Figure 11, page 76.) 

X h 2074 

= = . 078,898, h = 1.413; 

» 26287 

f k = .087,800, k = 1.354. 

26287 

Entering Pearson’s Tables for the above values of h and k and interpolating, it is 
found that = .652. 

The determination of r t by Pearson’s method is rather tedious when .2 ^ |r<| 
< .8. T his burden has been lifted by two fairly recent publications. Camp has 
given in his text (pp. 307-310) an ingenious and simple method for approximat¬ 
ing rt- His scheme is interesting from the mathematical as well as the practical 




76 


Mathematics of Statistics 


point of view. In Computing Diagrams for the Tetrachoric Correlation Coefficient 
by Thurstone et at. (available at the University of Chicago Bookstore), a useful 
approximation to r t can be determined by inspection. 



h k 

H!=/<Mt)dt, Hj=/tf>(t)dt 

— OO —00 

Fig. 11 

Exercises 

1. Show that the definition of p may be written in the form 

P = —— / / xyf(x, y) - xy. 

<r x <TyJ— a>J— oo 

2. Given that f(x, y) = 2/a 2 3 4 5 , 0<.x^y, 0<y^a. Show that both regres¬ 

sion systems are linear. Evaluate p. 

3. Derive (22). 

4. Prove that the area of the ellipse (30) is xW*«r*/(l — p 2 ) 1 / 2 . 

5. (o) If p = .6 show that the ratio between the major and minor axes of 

the ellipse is 2. 

(6) Show that the slope of the regression line of y on x for a nor mal cor¬ 
relation surface is p/(l — p 2 ) 1/2 in units of S u and <r x . 



77 


Joint Distributions of Two Variables 


6. Establish the truth or falsity of the following proposition: A necessary and 

sufficient condition that two variables be normally correlated is that their 
regression systems be linear. 

7. Prove that the regression systems of two normally correlated variables are 

linear and homoscedastic. 


8. For (23) prove the following: 

(a) the mean value of y x taken over all values of x is zero, 

(J b ) the variance of y x is equal to pV„ 2 , 

(c) the correlation coefficient between y x and y is equal to p. 


Hints, (a) Evaluate IX Vxf(x, y) dy dx, 

m v-/X Vx'fix, y ) dy dx. 


(c ) Evaluate 


IX 


9. If x and y are discrete variables, p is defined by 

E(xy) - E(x)E(y) 

p = -, 

<r x <r v 

where 

Oz = [E(x*) - {E(x)\ 2 ] in , <r„ = l E(y*) - {E(y)}*\ m , 


and 

E(x) = t^Xig(xi), 

l 

E(y ) = 'EyMyi), 

E{xy) = EEnyiffe Vi), 
l l 

fin, yi) being the probability for the simultaneous occurrence of the pair 

of values (x,, yj), g(x,) = Z/(^. Vi) and h( yi ) = E/(x f , Vj ) being the 
j=l *=1 

mar ginal distributions of x and y, respectively. Find p for the table in 
Exercise 8, § 13, Chapter I. 

10. Investigate the references given for tetrachoric r and give a report on the 
results of your study. 




CHAPTER V 

MULTIPLE AND PARTIAL CORRELATION 

1. Notation. Simple correlation theory deals with co-variation 
in two variables. If other factors are involved the two variables 
are assessed as the important ones for the investigation and the 
other factors are ignored. But situations frequently arise in the 
fields of agriculture, biology, economics, education, and psychology, 
which call for consideration of three or more influences bearing 
simultaneously on a problem, and hence for the investigation of 
interrelations among three or more variables. For example, crop 
yield varies with soil fertility, rainfall, and temperature; wheat 
production is affected by acreage planted and yield per acre; stu¬ 
dents’ honor points are connected with intelligence, health, hours 
of study, etc.; their chest measurements vary with stature and 
weight. 

The term multiple correlation refers to a theory of correlation 
involving three or more variables. For ease in exposition we shall 
restrict the derivation of formulas to the three-variable case although 



the method is perfectly general. When the three-variable case is 
understood the formulas can be generalized for k variables. 

The framework of a two-way table was a rectangle in the xy -plane 
78 



79 


Multiple and Partial Correlation 

which was divided into cells by lines parallel to the axes. The 
analogue in the case of three variables, which we shall denote by 

x, y, and z, is a rectangular parallelopiped divided into cells by slicing 
planes parallel to the axes. 

We shall denote the frequency in the cell whose mid-point has the 
coordinates (x, y, z ) by f(x, y, z). A pair of (x, y) values fixes a 
2 column (Figure 12), and the sum of the frequencies in such a 
column is the “ column total 

(1) y, z) = fix, y), 

Z 

where here and subsequently the symbol X together with the 
variable underneath denotes a summation in the direction of that 
variable. Now consider all those columns which have the same 

y. Their total frequency, denoted by 

(2) £/(*> y ) = f(v)t 

X 

may appropriately be called a slab total ” (Figure 13). 



Finally, if we add all the slab totals we get the total frequency N. 
Thus 


(3) 


'Em = x. 







80 


Mathematics of Statistics 


By making use of (1) we may, if we wish, express (2) as the double sum 

( 4 ) ££/(*, y, z) = f(y), 

and using (4) we may express (3) as the triple sum 

( 5 ) y >«) = n. 

x y z 

(a) The aggregate of the column totals f(x, y) forms a two-way 
frequency table. If we imagine the numerical values of these fre¬ 
quencies written in the cells of the zy-plane it is easy to see that 
they constitute a correlation table (Figure 14). For this table, the 
simple correlation coefficient r xv is called the total correlation (in 
contradistinction to a partial correlation coefficient to be defined 
later) and the regression curves are called the total regressions of 
y on x and x on y. Discussions analogous to (a) will now be given 
for horizontal columns parallel (6) to Ox and (c) to Oy. 



(b) A pair of (y, z ) values fixes an x column parallel to Ox. The 
sum of the frequencies in an x column is 

( 6 ) Z/(z, y, z) = f(y, z). 

If we add all those columns which have the same z we get a slab 
perpendicular to z whose total is 

( 7 ) Hf(y, z) = f{z). 


Finally, the totals of all such slabs is 

(8) E/(2) = N. 







81 


Multiple and Partial Correlation 


The numerical values of the totals f(y, z) written, if desired, in the 
cells of yz-plane form a two-way correlation table, as represented in 
Figure 15. For this table, r yz is the total correlation coefficient 
between y and z , and the regression 
curves are the total regressions of y on 
z and z on y. 

(c) Similarly, a pair of (x, z) values 
fixes a y column parallel to Oy. The 
sum of the frequencies in such a col¬ 
umn is 

(9) Z/fo V, *) = /(*> *)• 

v 

If we add all the columns which have 
the same x we get a slab perpendicular 
to x whose total is 

(10) Z/(*» 2 ) = /(*)• 

The sum of all such slabs is 

(11) E/(z) = N. 



/(y.O 


The numerical values of the column totals f(x, z) constitute a two- 
way correlation table whose correlation coefficient r Xi is the total 



correlation between x and z. The total regressions of x on z and z on x 
are given by the regression curves of this table (Figure 16). 




82 


Mathematics of Statistics 


2. Regression. The mean of a column at ( x , y) is defined by 

(12) z(x, y) = —— 2>/(ar, y, z). 

, f(x>v) • 

Similarly, the mean of an x column at (y, z) is 

(13) x(y, z) = -^ !>/(*, y, z ), 
and the mean of a y column at (x, z) is 

(14) y(s, z) = 7A-7 ZJs/(x, 2/, «). 

/(a;, z) y 

The regression plane of z on xy is that plane which fits the means 
of the z columns best in a least-squares sense. This should not be 
confused with the true regression surface, z on xy, which is defined 
as the locus of the mean points of the z columns. More accurately, 
it is the locus of these points as the dimensions of the cells approach 
zero. The regression plane, z on xy, is that plane which fits best the 
true regression surface, z on xy. Corresponding statements hold for 
the regression planes of y on xz and of x on yz. 

So far, it was convenient to designate our variables by the con¬ 
ventional letters used in representing three-dimensional space. We 
are now about to obtain the equations of the regression planes and 
in order to extend our results to k variables 
it will be desirable to change to a new set 
of symbols which will lend themselves more 
X2 readily to generalization. The switch will 
cause no difficulty. We shall now use x\ in 
place of z, x<i in place of x, and x 3 in place 
2 of y. The relations between the r’s in the 
old notation and the new are r xy = r 2 s, 
r vz = ri 3 , r xz = r i2 . The adjacent diagram 
will help us keep in mind the relations between the new symbols 
and the old. 

We shall now derive the equation of the regression plane of x t 
on x 2 and x 3 . In determining, under a least-squares criterion, the 
parameters in its equation it will simplify the exposition if we assume 
that the variables are measured from their respective means as origin. 
This may be assumed without loss of generality. Let the desired 
equation be of the form 

( 15 ) Xi = Ax 2 + Bx 3 + C. 




83 


Multiple and Partial Correlation 


Then we may determine the parameters in (15) so that the sum of 
the squares of the residuals 

(16) U = - Ax 2 - Bx 3 - CYf 

2,3 

is a minimum , / being short for fix i, x<i, x 3 ), and for 

. 2,3 x t x t 

Equating to zero the first partial derivatives of U with respect to 
A, B, and C, we obtain the equations 

^x 2 {xx — Ax 2 — Bx 3 — C)f = 0 , 

^2x 3 {xi — Ax 2 — Bxz — C)f = 0, 

C = 0. 


The simplification of the last equation is a consequence of our choice 
of origin since J>i/ = J^xtf = £x 3 f = 0 when the origin of x, 
is at the mean of its N values. The first two equations may be 
written in the form 

f A][> 2 2 /+ Bj^x&sf = 5>lX2/, 

^ l A ^jx 2 x 3 f + B^x 3 2 f = X) XiXaf. 

Let <n 2 be the variance of x* and let r fJ be the correlation coefficient 
between x» and Xj. Then by definition, 

^,Xi 2 f{x 1 , x 2 , x 3 ) = N<n 2 , 

^XiXjfix i, x 2 , x 3 ) = Nciofn. 

So (17) becomes 

| NA<x 2 2 + NB<T 2 <r 3 r 23 = Nc\<r 2 ri 2 , 

| NA<r 2 <T 3 r 23 + NB<t 3 2 = Nai<T 3 ri 3 . 


Solving for A and B we have 



c 2 


B = — 


C 3 


n 2 

n 3 

1 

7*23 

1 

7*23 

1 

r 23 


t 23 

1 

r 23 

1 

7*12 
7-13 
7 " 23 

1 



84 


Mathematics of Statistics 


It is convenient both for simplicity and for the purpose of general¬ 
izing to k variables to define the determinant R by 



ru 

ri 2 

J"l3 

R = 

Til 

r 22 

r-23 


r» i 

T 32 

T. 33 


and to let Ra be the cofactor of r ih that is, the minor of r,-,- including 
the sign factor (— 1 ) i+i . Thus, 

?*21 ^23 

?*31 T 33 I 

T 21 T 22 I 

r 3 i r 32 | ’ 

Clearly, r n = r 22 = r 33 = 1, and r i2 = r 2 i, etc., so the expressions 
for A and B may be written 

_ _ &lRl2 

g 2 Rii 

g _ __ VlRlS 

CzRu 

Hence (15) becomes 

( 19 ) --Ru+-Rn + -Ru = 0 . 

<Ti 02 

This equation gives the most probable value of Xi for assigned values 
of x 2 and x s , provided that the true regression is not far from being 
linear and the distribution of each Xi column is nearly symmetrical 
so that its mode is close to its mean. It is an important equation 
because it shows how, on the average, changes in x 2 and x s affect x t . 
The student will observe that the R’s involve only simple correlation 
coefficients and that all the necessary computations for the terms in 
(19) were explained in Part I. 

There are two analogous equations for the regression planes of x 2 
on XiX 3 , and x 3 on x^, which can be obtained readily from (19) by a 
cyclical permutation of the subscripts on x and R. They are 

-K !2 + - 3 B 2S + ^B S1 = 0 

G 2 G 3 G\ 


Rl2 = ~ 

R<13 = 


(20) 



Multiple and Partial Correlation 


85 


when x 2 is the dependent variable, and 

(21) — R 33 4— R 3 1 4~ — R 32 = 0 

a 3 o r 2 

when x 3 is the dependent variable. Referred to an arbitrary origin 
(19) would have been 

( l 9a) +*^*2 ft , ft3 = 0 , 

O' 1 O2 0-3 

where X t — X t = Xi. Analogous adjustments of (20) and (21) are 
obvious when the variables are referred to an arbitrary origin. 

The three-dimensional case can now be generalized. By methods 
similar to those employed in deriving (15) we can derive the linear 
regression equation for k variables. Thus we have the hyperplane x x 
on £ 2 , x 3 , • • •, x k , 


( 22 ) 


■ Rn + 


*2 

0-2 


R\2 + ' ' 


4 — Rik — 0 , 

0* 


where R »,• is the cofactor of in 




ru ... 

. . n k 

(23) 

R = 

. . r 22 . 




rjfci . . . 

. . r k k 


When expressed in standard units, (22) becomes 

(22a) h = — —— J^RuU, 

ivil i=2 

where U = £»/<r*. Then h may be regarded as a weighted mean of 
the contributions of the other variables. The factor Ru represents 
the force or weight of U when all these variables are given an oppor¬ 
tunity to predict the value of h. 

3. Standard Error of Estimate. In Part I (Chapter VIII) we 
learned that S v 2 = <r y 2 (l — r 2 ) was a measure of the closeness with 
which the means of the x arrays of y clustered about the line of re¬ 
gression of y on £. S v was called the standard error of estimate and 
the larger r was, the smaller was S y . We now seek an analogous 







86 


Mathematics of Statistics 


expression for the three-variable case. To this end let 

(24) Sl* 1 = i xt, x,) 

m 1 . 2,3 


where 5 is the distance, measured parallel to the xi-axis, between the 
regression plane and the points (xi, x%, x 3 ), denoting a summation 

1 , 2,3 

over all these points. That is, 8 = (observed Xi — estimated Xi), 
the estimated Xi being given by (19). Then we may write 

NSim* =“E (bu~ +Ru~+ Rn-Xf 

till \ <Ti Oi a 3 / 

a x *N 

— *7TT (fin 2 + Ru 2 + Riz 2 + ^RuRnfu 
till 

+ 2RnRiari3 + ZRnRizr^z) 

— \Rn(Rn + T12R12 + tuRiz) + RuiRn + ruRn + ^ 23 ^ 13 ) 

nu 

+ Rn(Ri3 + TuRn + r 2 3-Bi2)}. 


According to Laplace's development of a determinant, the elements 
of any row (or column) and their corresponding cofactors may be 
used to develop R. If, in the resulting expression, the elements 
of this row (or column) are replaced by the corresponding elements 
of some other row (or column) the expression vanishes. Therefore, 
we have 


(25) 


R11 + ruRn + ruRu = R } 

(a) 

R12 + rnRn 4* r^Rn — 0, 

( b ) 

R13 + fiaRu + r2sRu = 0. 

(c) 


Using (25) in the above derivation we obtain 


(26) 


$ 1 . 23 2 = 


<7i a # 


This is a kind of average variance in Xi columns of the observed values 
of Xi from its corresponding estimated values on the regression 
plane (19). The square root of (26), 

/ R\ 1,z 

( 26 a) S 1.23 = Oi () 

is called the standard error of estimating xi from assigned values of 
X 2 and x 3 . 



87 


Multiple and Partial Correlation 


4. Standard Deviation of Estimated Values. Next, we shall 
obtain an expression analogous to a Ey of Part I (§ 7, Chapter VIII) 
for the standard deviation of the estimated values given by (19). 
The mean value of these estimates is zero since x t is measured from 
its mean as origin. Therefore, the variance, a E i 2 , of the estimated 
values of x x is given by 


. 1 ^ /Vi Ru . Rw V, 

(27) <r* 


’ Rn 2 


(Rl2 2 + Rl3 2 + 2Rl2RlzT23) 


— {Rn(Rl2 + Riz^z) + Rl 3 (Rl 3 + R13T23) } 

R u 

= { -Ri 2 Ruri 2 - RuRnrn} by (6) and (c) of (25) 

R11 

= (R.i - R) by (a) of (25) 

Ru 



Hence we have 



If this result is to correspond to <r Ev = <r„r we would expect that 
the factor (1 — R/R xx ) 112 would correspond in some way with r. 
This is indeed the case and we shall now show that this factor is the 
formula for the multiple correlation coefficient of x x on x 2 and x s . 

6. Multiple Correlation Coefficient. The ordinary correlation co¬ 
efficient between the observed values of x x and its corresponding 
estimated values calculated from (19) is called the multiple correla¬ 
tion coefficient of x x on x% and x 3 . It is denoted by ri. 23 , so we have 

eXi 

ri 28 ” Nffiam 

where 0 x x and E x x denote the observed and estimated values, respec¬ 
tively, of X\. 




88 


Mathematics of Statistics 


Using (19) this may be written in the form 

x X\ ( Rl2 %2 RlZ %z\ 

iV<ri<r£iri.23 = ffi 2 2^ — \ - D-D-) 

ffl \ itn ff2 it'll 0"3/ 


= —— (—jRi2ri2 — Ru^u) 
Ru 


iVffi 2 

Ru 


(Ru ~ R) 




Making use of (28) in the above result we have the required formula 

R \l/2 


(29) 


ft.23 


-(‘-£ 




By a cyclical permutation of the subscripts we can write at once the 
formulas for the multiple correlation coefficients of x 2 on Xi and x 3 , 
and of x 3 on Xi and x 2 . They are 

_ R_Vi 2 

R 22 ) 

R \m 

By writing (26) in the form 


(30) 

(31) 


*2.31 — ^ 
3.12 = 




we obtain the formula 

(32) S1.23 2 = <ri 2 (l - n.23 2 ) 

which is quite analogous to the expression for S y * in simple correla¬ 
tion. It is clear from (32) that 

(33) -1 ^ r, 23 ^ 1. 

Each of the formulas (29), (30), and (31) may be generalized for 
k variables. Thus the multiple correlation coefficient of order k — 1 
of Xi with the other k — 1 variables is 


R V' 2 



Multiple and Partial Correlation 


89 


where now Ri } - is the cofactor of of R as defined in (23). While a 
mathematical generalization gives a more complete and aesthetic 
presentation, it is seldom that (22) or (34) are of value in practical 
cases for more than four variables. 

For computing purposes it is pleasant to know that multiple 
correlation coefficients are expressible in terms of simple correlation 
coefficients. 


Example 1. Three variables have in pairs simple correlation coefficients given 
by 


Tu = .8, ris = — .7, t 23 = —.9. 


Find the multiple correlation coefficient ri. 2S of x x on x 2 and x s . 
Solution. 


Ru 


1 .8 -.7 | 

.8 1 

-.7 -.9 1 | 

= .19, n. as =.8013. 


Example 2. Suppose it is found that r x2 = . 6 , r 13 = —.4, r 33 = .7. Comment 
on these results. 

Solution. R = —.346, Rn = .51, ri .23 = 1.29. Inspecting the given r’s we 
observe that large values of X\ are associated with large values of x 2 , but since 
ru is negative it would mean that small values of xi go with large values of x 3 
which is impossible when r 12 and r 2i are positive. 


6. Limiting Cases. The following theorems are interesting in 
themselves and shed light on interpretations of the theory in applica¬ 
tions. 

Theorem I. The necessary and sufficient condition for coincidence 
of the three regression 'planes (19), (20), and (21), is 

(35) ri 2 2 ris 2 + r^ 2 — 2r nr 1^23 — 1. 

Proof. From elementary analytic geometry, we know that a 
necessary and sufficient condition that two equations of the first 
degree represent the same plane is that their coefficients be propor¬ 
tional. For our equations this will be true when 

Rn Ru R13 

Rn R22 R23 

Rn R22 R23 

Ru R23 R33 


and 



90 Mathematics of Statistics 

When expressed in terms of r»,- these relations, it will be found, all 
satisfy (35). 

An alternate proof is as follows. When Si . 23 2 — 0 there is perfect 
functional dependence between the variables, assuming linear regres¬ 
sion. It is evident from (26) that &. 23 2 = 0 when R = 0. Upon 
expanding R in terms of m and equating the result to zero we ob¬ 
tain (35). 

Corollary. Assuming linear regression, the criterion for perfect 
correlation between three variables is given by (35). 

Example 3. Given the following data, r i2 = .6, ns = .4. Find the value of 
r 23 in order that n.23 = 1. 

Solution. Substituting the given values in (35) we have 
7 * - ,48r - .48 - 0, 

where the subscripts are dropped for the moment. Solving, we find r = .24 
± .73. So r 2 3 = .97. 

The example shows that even though r n and r i3 are individually 
small, it does not follow that there cannot be high correlation between 
x h X 2 , and x 3 . Indeed two variables which individually with a third 
variable have correlations which are apparently worthless for pre¬ 
dicting purposes may be very valuable when the three variables are 
taken together and multiple regression employed. On the other 
hand, it may be possible to get as good a prediction from r n or fi 3 
using simple regression as from multiple regression. This situation 
will be clarified by the following theorems. 

Theorem n. If r 23 = 1, then ri . 23 2 = n 2 2 = ri 3 2 , and &. 23 2 = 
<ri 2 (l — r x2 2 ). 

Proof. When r 23 = 1 then Ru = 0 and it would appear from 
(29) that ri . 23 then becomes infinite. But this is impossible by (33). 
When r 23 = 1 it will also happen that ri 2 = ri 3 . The student can 
easily verify this by letting r 23 = 1 in (25) and subtracting (c) from 
( 6 ) there. So we shall first see what (29) becomes when ri 3 = r 12 . 
If in (a) of (25) we let r u = r i2 we obtain R — Rn = 2rnRn, since 
Rn then equals Rn. Substituting this result in (29) we soon have 

—2rj. 2 jRi2 2 7 * 1 2 2 

ri . 23 2 = —--- = 7 —-» 

R 11 1 + r 23 

remembering that ru = rn. Now if we let r 23 = 1 in the last 
expression we obtain the first conclusion of the theorem. The second 
conclusion follows from the first and formula (32). 



Multiple and Partial Correlation 91 

In this case, then, multiple regression has no advantage over the 
simple regression X\ on x 2 or Xi on x 3 , because the standard error is 
exactly what it would be if the third variable were not added. Since 
r 23 = 1, there is perfect linear dependence between x 2 and x 3 . Geo¬ 
metrically, all the data lie in the regression plane. 



Theorem III. When r 23 = 0 then j*i. 23 2 = ri 2 2 + r X3 2 . 

Proof. When r 23 = 0 it is easy to show that Ru = 1 and R = 
1 — r 12 2 — ri 3 2 . So from (29) we have 

J"i.23 2 = ri 2 2 -f- rj 3 2 . 

The formula for the standard error of prediction then becomes 

Sl.23 2 = <Ti 2 (1 — Ti2 2 — Ti 3 2 ). 

Hence, when x 2 and x 3 are completely independent, multiple regres¬ 
sion gives a better prediction than would be given by either of the 
simple regressions Xi on x 2 or x x on x 3 ; very much better if also ri 2 
and n 3 are nearly equal. If they are exactly equal their maximum 
value is (|) 1/2 = .707. This theorem shows that one has a good 
regression equation for predicting when each of two variables is 
highly correlated with the third variable but not with each other. 

7. Partial Correlation. It is often important to measure correla¬ 
tion between two variables when the other variables have assigned 
values. For the case of three variables, to which we limit our atten¬ 
tion, consider a slab parallel to the xix 2 plane (Figure 13). This is 
a sub-set of N which forms a two-way correlation table in which 
the relations between x x and x 2 hold for a fixed value of x 3 . The 


92 


Mathematics of Statistics 


correlation coefficient between xx and x 2 in this sub-distribution is 
called the partial correlation coefficient between Xi and x 2 for the 
assigned x 3 and is conventionally denoted by 

7*12.3. 

The regression curves for the table consisting of this sub-distribution 
are called the partial regression curves. A classical example of a 
partial correlation coefficient is the correlation between statures of 
fathers and sons when the stature of the mother is a particular value, 
say 62 inches. 

In order to express r 12 . 3 in terms of the total correlations r ih as we 
were able to do in the case of 7*1.23, it will be necessary to assume a 
theoretical or ideal situation. Suppose we are dealing with a distri¬ 
bution for which the total regression curves are straight lines and 
the regression surfaces are planes. Then the partial regression line, 
Xi on x 2 , in our table at x 3 will be a section of the regression plane, 
Xi on x 2 x 3 , because the line will contain the mean points of all 
the x x columns, defined by the points (x 2 , x 3 ), which lie in the table 
at x 3 . 

In the two variable case, described in Part I, we learned that 
S y 2 was an average of the variances in the x arrays of y taken over 
all the values of x. Moreover, when the distribution was normal 
we proved that these variances were constant and S y 2 was precisely 
this constant variance. The three variable case, in the ideal distri¬ 
bution we are about to consider, is quite analogous. Recall that 
Sx . 23 2 could be regarded, in the ordinary case of linear regression, 
as an average of the variances of Xx in the several columns at (x 2 , x 3 ) 
since, when regression is linear, the means of the columns lie on the 
regression plane. Now let us assume that the distribution is homo- 
scedastic in the Xx direction so that the variances in all the columns 
of xx are the same. Under these assumptions, Sx.23 2 is the variance 
in each column of Xi’s. Let 01.3 2 be the variance of Xx in the table 
at x 3 . Remember that 7*12.3 is the correlation coefficient in this 
table and that regression is linear and homoscedastic. Therefore, 
for the variance $i.2 3 2 in each of the columns of this table we may 
write 

( 36 ) $ 1 . 23 2 = 0 " 1. 3 2 (1 — 7 * 12 . 3 2 ). 

Now consider the two-way table of totals f(x 1, x s ). In this table, 
7 *i 3 is the total correlation between x x and x 3 , and <n. 3 2 is the variance 



Multiple and Partial Correlation 


93 


in an £3 array of xi’s. Since, under our assumption, <n. 3 2 is constant 
over these arrays, we may write 

(37) tfi. 3 2 = ffx 2 (l — ^i3 2 ). 

From (32), (36), and (37) we obtain 

(1 - W) = (1 - r„>)(l - ru., 1 ) 


that is 


Solving, we have 


■f- = R m (1 - ru.8*). 

tin 


ri2.3 2 


R11R22 — R 


R11R22 

By expanding the 12’s it is readily verified that 

R11R22 - R = (-R12) 2 

is an identity. Therefore, we have the final result 

— R\2 


(38) r,!S = (RuRvF'' 

This may be written, if desired, in the form 

« _ ri2 — ^23 _ 

(38a) r 12 . 3 - { (1 _ ris 2 )(1 _ r23 2)}i/2* 


By letting sin 6 = r, it is seen that tables of cos 6 = (1 — r 2 ) 1/2 will 
facilitate the computation of (38a) in numerical problems. 

Since (38a) does not involve £ 3 , the value of ri 2 .3 for one assign¬ 
ment of £3 is the same as for any other assigned value of £3. Therefore, 
not only must the distribution be homoscedastic in the £1 direction, 
but also the value of 7 * 12.3 in all slabs perpendicular to the £ 3 -axis 
must be the same. It is fairly obvious that these conditions would 
not, ordinarily, be satisfied in practical applications. So, in the 
applications, 7*12.3 is regarded as a sort of average value of the partial 
correlations which could be obtained for all assignments of £3. The 
chief use of partial correlation is in testing what the correlation 
between two variables would be if the third variable were not inter¬ 
fering with the relationship. 



94 


Mathematics of Statistics 


Example 4. In a study of the factors which influence “ academic success,” 
May * obtained the following results (among others) based on the records of 450 
students at Syracuse University. 


Xi = honor points, 
Xi = 18.5, 

C H = 11.2, 

Tm = .60, 


Xi = general intelligence, 

X 2 = 100.6, 

<r2 = 15.8, 
ru = .32, 


X s = hours of study, 
X 3 = 24, 

<T3 = 6 , 

Ta = —. 35. 


One purpose of the study was to find to what extent honor points were related 
to general intelligence, when hours of study (per week) are held constant. Using 
(38a) it is found that r^.s = .802. 


8. An Alternate Derivation. It is useful to approach the subject 
of partial correlation from another point of view. Assume, as before, 
that the variables Xi, x 2 , x 3 , are referred to the general mean as origin. 
Suppose that we wish to know what the correlation between x x and x 2 
would be if the influence of x 3 were eliminated. Let us subtract from 
the Xi of each point that part of xi which is due to the influence of x 3 
as indicated by the regression line X\ on x 3 and denote the residual 
by X\. 3 . Then subtract from the x 2 of each point that part of x 2 
which is due to x 3 as indicated by the regression line x 2 on x 3 and 
denote the residual by x 2 . 3 . Thus we have 

£1.3 = Xx — r X3 —x 3 , 

03 

02 

® 2.3 — x 2 r 2 3 £ 3 . 

0-3 

We shall now prove that the simple correlation coefficient between 
£ 1.3 and £ 2.3 is precisely ri 2 . 3 . By definition, this simple correlation 
coefficient would be 

y jL,Xi. 3 x 2 . 3 f(x h £2, £3) _ 

N <T 1 . 30 " 2 .3 

Making use of (39), the numerator of (40) becomes 



^£1X2/ - r 13 —^x 2 x 3 f — r 23 — 2*1X3/ + r l 3 r 2 „ 2 *s 2 / 

03 <r 3 <s 3 

= N(a\<r 2 r 12 — o’io' 2 ri3r 2 3 — (T\<r 2 ri 3 r 23 -(- o’io - 2ri3r 2 3) 

= N<Ti<r 2 (r 12 — r 13 r 23 ). 

* Predicting Academic Success — Mark A. May, Journal Educational Psy¬ 
chology, 1923, vol. 14, 7, 429-440. 



Multiple and Partial Correlation 


95 


Now by (37), 


and similarly, 


ci .3 = ci(l - r ^ 2 ) 1 ' 2 
C 2.3 = C 2 (l - r 23 2 ) 1/2 . 


Inserting these results in (40) we obtain the promised result 

ri 2 — ri3r 2 3 _ 

|(1 - r 12 *)(l - 

When interpreted according to this derivation, ri 2 .3 is sometimes 
called the “ net ” correlation between x t and x 2 . 

Interesting interpretations of multiple and partial correlation 
in terms of spherical trigonometry will be found in the following 
references: 

1. Burgess, The Mathematics of Statistics, pp. 266-267; Houghton Mifflin Co. 

2. Jackson, The Trigonometry of Correlation, Journal of the American Mathe¬ 

matical Association, vol. 31, pp. 275-280. 


Exercises 

1. Find the multiple correlation coefficients and the regression equations for 

the data in Example 4. 

2. {Garrett) The r for intelligence and school achievement in a group of 

children 8 to 14 years old is .80. The r for intelligence and age in the 
same group is .70. The r for school achievement and age is .60. What 
will be the correlation between intelligence and school achievement in 
children of the same age? 

3. {Yule and Kendall) The following means, standard deviations, and cor¬ 

relations are found for 

Xi = seed-hay crops in cwts. per acre, 

AT 2 = spring rainfall in inches, 

X s = accumulated temperature above 42° F. in spring, 
in a certain district in England during 20 years. 

Xi = 28.02, <n = 4.42, n* = .80, 

Xi = 4.91, <n = 1.10, r u = -.40, 

Xz = 594, 3 = 85, r 2 3 = —.56. 

Find the partial correlations and the regression equation for hay crop on 

spring rainfall and accumulated temperature. 

4. Derive and explain the relation o-j 2 = < tei 2 +&.23 2 - What is the corre¬ 

sponding relation in simple correlation? 



96 Mathematics of Statistics 

5. The following data relate to land values and crops in twenty-five Iowa 
counties. 


Xi = average value per acre of farm land on January 1, 1920, 
X 2 — average yield of com per acre in bushels 1910-1919, 

Xi = per cent of farm land in small grain, 

Xi = per cent of farm land in com. 


County No. 

Xi 


x 3 

Xi 

1 

$ 87 

40 

11 

14 

2 

133 

36 

13 

30 

3 

174 

34 

19 

30 

4 

385 

41 

33 

39 

5 

363 

39 

25 

33 

6 

274 

42 

23 

34 

7 

235 

40 

22 

37 

8 

104 

31 

9 

20 

9 

141 

36 

13 

27 

10 

208 

34 

17 

40 

11 

115 

30 

18 

19 

12 

271 

40 

23 

31 

13 

163 

37 

14 

25 

14 

193 

41 

13 

28 

15 

203 

38 

24 

31 

16 

279 

38 

31 

35 

17 

179 

24 

16 

26 

18 

244 

45 

19 

34 

19 

165 

34 

20 

30 

20 

257 

40 

30 

38 

21 

252 

41 

22 

35 

22 

280 

42 

21 

41 

23 

167 

35 

16 

23 

24 

168 

33 

18 

24 

25 

115 

36 

18 

21 


(a) Find the linear regression equation of Xi on X 2 X 3 Xi. 

(b) Estimate the first five values of Xi, using the equation obtained in (a). 

(c) Calculate >Si. 234 and ri. 2 * 4 . 



CHAPTER VI 


FUNDAMENTALS OF SAMPLING THEORY WITH SPECIAL 
REFERENCE TO THE MEAN 

1. Introduction.* To emphasize the viewpoint of the subject of 
this chapter it is convenient to recognize two general classes of prob¬ 
lems in mathematical statistics. In problems of the first class our 
concern is largely with the exposition of methods of characterizing 
observed data. Thus in the first class would fall methods for sum¬ 
marizing the pertinent information in a set of variates by means of 
averages, measures of dispersion, indices of correlation, etc. In 
problems of the second class, however, the data at hand are regarded 
as a random sample drawn from a well-defined class of variates called 
the population or universe of discourse, and we are concerned with 
drawing inferences about the universe from the sample. By a sample, 
more precisely a random sample, we mean a sub-set of variates in 
which each individual from the universe has an equal and independent 
chance to be included. From this chosen sample we attempt to draw 
inferences concerning the universe. In order to deal with this induc¬ 
tive argument we first consider a deductive argument; that is, 
we first consider an infinite (or finite) universe and investigate the 
behavior of samples according to the laws of probability. The 
methodology dealing with this class of problems is known as sampling 
theory. Although the two classes of problems are not entirely dis¬ 
tinct with regard to their treatment, the center of interest in sampling 
theory is the development of criteria for assisting common sense or 
educated judgment concerning the magnitude of chance fluctuations 
in statistical ratios, averages, and coefficients. 

The Bernoulli theory deals with sampling fluctuations in relative 
frequencies. In the words of Professor Rietz , 1 

But it is fairly obvious that the interest of the statistician in the effects of 
sampling fluctuations extends far beyond the fluctuations in relative frequencies. 
To illustrate, suppose we calculate any statistical measure such as an arithmetic 

* A reference list is given at the end of each of the following chapters to which 
attention is directed in the course of the discussion by the use of superscripts. 

97 



98 


Mathematics of Statistics 


mean, median, standard deviation, correlation coefficient, or parameter of a fre¬ 
quency function from the actual frequencies given by a sample of data. If we 
need then either to form a judgement as to the stability of such results from sample 
to sample or to use the results in drawing inferences about the sampled population, 
the common sense process of induction involved is much aided by a knowledge of 
the general order of magnitude of the sampling discrepancies which may reason¬ 
ably be expected because of the limited size of the sample from which we have 
calculated our statistical measures. 

A statistical measure calculated from the actual frequencies given 
by a sample has been called a statistic by R. A. Fisher . 2 This is to 
avoid a verbal confusion with the corresponding parameter in the 
universe which we should like to know but can generally only esti¬ 
mate. It is a matter of common experience that a statistic will 
vary from sample to sample. To characterize the variation that 
may be tolerated on the basis of chance is one of the fundamental 
problems of sampling theory. 

In discussing such sampling fluctuations, Fisher 3 introduces the 
subject as follows: 

The idea of an infinite population distributed in a frequency distribution in 
respect of one or more characters is fundamental to all statistical work. From 
a limited experience, for example, of individuals of a species, or of the weather 
of a locality, we may obtain some idea of the infinite hypothetical population 
from which our sample is drawn, and so of the probable nature of future samples 
to which our conclusions are to be applied. If a second sample belies this ex¬ 
pectation we infer that it is, in the language of statistics, drawn from a different 
population; that the treatment to which the second sample of organisms had 
been exposed did in fact make a material difference, or that the climate (or 
methods of measuring it) had materially altered. Critical tests of this kind 
may be called tests of significance, and when such tests are available we may 
discover whether a second sample is or is not significantly different from the first. 

2. Method of Attack. The whole theory of sampling is based on 
frequency distributions and probability. In order to explain the 
tests of significance that have been developed, it is desirable to out¬ 
line briefly the philosophy underlying the method of attack. 

Sampling theory deals with specific questions like the following: 
Given the mean and standard deviation of a sample of N variates, 
how reliable are these estimates of the population mean and standard 
deviation, respectively? Given two samples, do their respective 
means or other statistics differ significantly? Can the differences 
be accounted for on the basis of chance or do the samples come from 
different populations? The answers require in general that we con- 



99 


Fundamentals of Sampling Theory 

ceive the universe as one distribution, the values of the statistic 
calculated from all possible samples of size N from that universe 
as another distribution, and that there are mathematical expressions 
capable of representing both distributions. This is the chief reason 
for studying frequency curves and probability distributions. 

Suppose, for example, that we have computed a statistic — say the 
mean of 100 observations or measurements. What we get is not an 
absolutely fixed quantity which may be exactly reproduced again 
by taking 100 similar measurements. Indeed, if such an experiment 
were repeated many times, we would get values for the arithmetic 
mean which would form a frequency distribution. This distribution 
would have its own mean (mean of means), standard deviation, and 
higher moments. The law describing the frequency distribution of 
all possible means of samples of size N from a specified universe is 
called a distribution function when it can be expressed mathemati¬ 
cally. Its graph is called the curve of means. What has been said 
of the mean holds similarly for any other statistic. 

Formulation of statistical judgment about a sample involves the 
specification of the universe and the determination of the distribution 
function of a given statistic in samples of a given size drawn from 
this universe. The problem of determining the distribution functions 
for the various statistics from specified universes is one which has 
challenged modern mathematical research. In most cases it has 
been necessary to assume that the parent universe is of the normal 
form in order to obtain analytically the sampling distribution of the 
statistic. Many of the tests of significance are based upon this 
assumption. However, considerable information about sampling 
distributions from arbitrary universes is known in terms of their 
moments or expected values. 

3. Expected Values. Let the continuous variable x be subject to 
the distribution function f(x) and let <f>(x) be an arbitrary function 
of x. Then the expected value of <t>(x), denoted by application of the 
operator E, is defined by 

(1) E{<f>(x)} =J J>(x)f(x) dx, 

provided this integral exists. In particular, if <t>(x) = x k , (k — 
1, 2, • • •), we have 


«(**) x k f(x) dx. 



100 


Mathematics of Statistics 


For k = 1, this defines the mean of the x’s in the universe represented 
by /(x). Hereafter we will denote the mean of a universe of x’s by x 
and restrict x to denote the mean of a sample from that universe. 
Therefore, we may write * 

(2) E(x) = x. 

If 4>{x) = (x — x) 2 , we have the variance of x, 

°* 2 = E(x - x) 2 

W = E(x 2 ) - x 2 . 

The (positive) square root of a x 2 is called the standard deviation or 
standard error of the distribution of x. Analogous definitions hold, 
of course, for y. 

If the variables x and y are simultaneously distributed in accord 
with the function/(x, y), then 


E(xy) = 


SIX 


xyf(x, y) dy dx. 


If x and y are not independent variables in the probability sense, 
then, as we have seen in Chapter IV, /(x, y) g(x)h(y) where g{x) 
and h(y) are the marginal distributions of x and y, respectively. 
The correlation coefficient, p, between x and y in the bivariate 
universe represented by /(x, y) is defined by 

E(xy) - xy 

(4) P -- 

a x<Xy 

The quantities x, a, p, etc., relating to a universe are called param¬ 
eters. 

The following propositions may easily be established from pre¬ 
ceding definitions so they are stated without proof. 

I. The expected value of the product of a variable and a constant is 
equal to the product of the constant and the expected value of the variable. 
That is , 

E(cx) = cE(x). 

II. The expected value of deviations of a variable from its expected 
value is zero. That is, 

E(x - x) = 0. 

* x is read “ x tilde.” 



Fundamentals of Sampling Theory 101 

III. The expected value of the sum of two or more variables is the sum 
of their expected values. In symbols, 

E(x + y + z) = E{x) + E(y) + E(z). 

IV. If x and y are mutually independent variables in the probability 
sense, then the expected value of their product is equal to the product of 
their expected values. That is, 

E(xy) = E{x)E{y). 

V. The expected value of the product of deviations of two mutually 
independent variables from their respective expected values is zero. 
That is, 

E{(x - x){y - y)) = 0. 

VI. The expected value of the product of deviations of two correlated 
variables from their respective expected values is given by 

E{{x — x){y — y)) = pa x <x v . 

4. Standard Error of a Linear Function of Variables. Suppose 
a variable is a linear function of two or more independent * variables 
each of which may take on a universe of values and we require the 
standard error of this function in terms of certain moments of the 
underlying distributions of independent variables. To this end let 

(5) w = ciXi + c 2 x 2 + • * • + cnXn 

where each variable x k , (k = 1, 2 , • • • , N), is arbitrarily distributed 
and where the c’s are arbitrary constants. Let a k represent the 
standard error of x k in the universe to which it belongs, and let pa 
represent the correlation coefficient (if any correlation exists) between 
Xi and Xj. We seek the standard error of w, <t w , in terms of a k and 
PH, (i = 1 to N, j = 1 to N). 

Case I. We will suppose first that the variables in the several 
universes are correlated, that is, that pa 5 ^ 0 for every combination 
of i and j. From (5) and Proposition III we have 

( 6 ) E(w) = CiE(xi) + CnEixf) + • • • + cnE(xn), 
that is 

(7) w = C\Xi + C 2 X 2 + • ■ • + CnX at. 

* We are using the phrase “ independent variables ” here in the ordinary 
sense of analysis to designate the variables on which a special function depends, 
without any implication that these variables are independent of each other in 
the statistical sense. 



102 


Mathematics of Statistics 


Then 

E(w - w ) 2 = Y,cmxi — Xi ) 2 + ]£ CipjE{xi — Xi)(xj - Xj) 

which by definition (3) and Proposition VI becomes 

(8) oV = + gjciCffiiW. 

If Ci = 1, c 2 = ±1, and N = 2, we have as a special case 

(9) <r w 2 = o-! 2 ± 2pi 2 <r 1 <r 2 + <r 2 2 . 

Case II. Suppose the x’s in (5) are mutually independent in the 
statistical sense so that p*,- = 0. Then (8) becomes 

(10) cr w 2 = cM 2 + c 2 2 <r 2 2 + • • • + c n 2 < t n \ 

5. Theorems. Relations (6)-(10) enable us to prove some in¬ 
teresting and useful theorems about the distribution of means of 
samples from an arbitrary universe. The following definition will 
make the notion of sample precise. 

Definition. Let (x h X2, • • • , xn) be a set of N independent vari¬ 
ables each subject to the same distribution function g, so that their joint 
distribution function is 

f(x i, x % , • • • , x N ) = g(x x )g{xi) ■ • • g{x N ). 

Then (x h x i} • • • , x N ) is called a random sample of N from a universe 
with distribution function g(x). 

Table 6 exhibits the notation which will be used for the moments 
of the several distributions referred to in Theorems I—III. 


Table 6. Notation 



Universe 

Sample 

Distribution of 
Means 

Mean 

2 

X 

E(jk) = 2 

Standard Deviation 


s 

a- 

Variance 

o* 2 

s 2 

a x 2 

Skewness 


as;* 

a six 

Kurtosis 

a« : * 

- 



Theorem I. If samples of size N be drawn from an arbitrary 
universe and if x be the mean of a sample, then the mean of all possible 
such means equals the mean of the universe. That is, 

(id e {:*)-*. 



Fundamentals of Sampling Theory 


103 


Proof. In (5), let Ci = c 2 = • • • = c N = l/N and let x h x 2 , • ■ ■, 
x N , constitute a sample from a universe with mean x and variance 
cj 2 . Then w = x. As a consequence of the definition of sample, 
E(xi) = x for each value of i from one to N. Therefore, (6) gives us 
E(x) = x. 

Theorem II. The variance of the sampling distribution of means 
from an arbitrary universe equals the variance of the universe divided 
by the number in the samples. In symbols, 


( 12 ) 

Hence, 

(12a) 



a x 



Proof. As in the proof of Theorem I let w = x. Then (10) 
becomes 


(13) 


N 2 



Since the x’s constitute a sample, <r< 2 = a x 2 for each value of i from 
1 to N. So (13) reduces to (12). 

Theorem HI. The moments describing skevmess and kurtosis in the 
sampling distribution of means are related to the corresponding moments 
in the universe by the following formulas: 



A proof of (14) could be given by developing and applying addi¬ 
tional propositions on expected values. However, this method is 
tedious for the higher moments. A more elegant proof can be given 
by means of characteristic functions. 4 Such a proof has been made 
available by Shewhart 6 for the discrete case. 

The first and second theorems show us that in repeated samples, 
x is distributed about x with standard deviation <r x /(N) 112 . Theorem 
III tells us something about the form of the distribution. Thus if 
the universe is normal so that a 3: * = 0 and a 4;x = 3, then from (14) 
we see that a 3: 5 = 0 and a 4: * = 3, so the sampling distribution of x 
fromanormal universe has the normal values forskewness and kurtosis. 




104 


Mathematics of Statistics 


In the next three theorems it will be understood that x and y are 
correlated variables which are jointly distributed in accord with an 
arbitrary function f(x, y) in which the parameters are x , y, <r x , <j y , 
and p. 

Theorem IV. Let {x h yi), (x 2 , y 2 ), • • • , (x N , 2/v), be a sample of 
N pairs drawn independently from the distribution characterized by 
f(x, y) and let (x, y) be the mean of a sample. The correlation coefficient , 
R, between the means of all possible such samples equals p. 

Proof. By definition, 

(15) fl= g(xg)^xg , 

o’5 o’ y 

and 

E(xy) = — E{(xi + x 2 + • • • + x N )(yi + y 2 + • • • + Vn)} 
_ E(S ) 

V 2 V 

where 

S = xiyi + xiy 2 + • • • + xi y N + 

x 2 yi + x 2 y 2 + • ■ • + x 2 yjf 

.. + 

x»yi + XnP 2 + • • • + x N y N . 

We will separate S into two parts, conveniently called u and v, where 


and 


Then 


u = x x y x + x 2 y 2 + • ■ • + x N y N , 


v == sum of ( N 2 — N ) terms of the form i 5 * j. 


B(u) = = T,\E(x. Vi )\ = NE(xy ) 


In v, Xi must be uncorrelated with y, since i j. Therefore, 
E(x<yj) = E(xi)E(yj) = xy, 


and 

So we have 


E(v) = (N 2 - N)xy. 

E(S) = NE{xy) + (W 2 - N)xy, 




Fundamentals of Sampling Theory 


105 


and therefore 

(16) E(xy) = ^ \El{xy) + (N - 1 )xy). 

Making use of Theorem II and (16) the right member of (15) reduces 
to the definition of p. 

Theorem V. Let x be the mean of a sample of N from g(x) and let 
y be the mean of a sample of N from h(y) where g(x) and h{y) are the 
marginal distributions of the universe characterized by f(x, y) of corre¬ 
lated variables. Let w = x — y. The variance of the sampling 
distribution of w is 

( 17 ) <r w 2 = — fax 2 — 2p<r x<ry + o'v 2 )- 

The proof follows from (9) and Theorem IV. 

Theorem VI. Let x and y be the means, s x and s y the standard 
deviations, and r the correlation coefficient in a sample of N correlated 
items. Suppose N is so large that s 2 is a good estimate * of a 2 and r of p, 
so that we may write 



The variance of the sampling distribution of w = x — y may be com¬ 
puted from the sample by the formula 


(18) 



(!>,• ~ £?/«•) 2 

N 


The proof follows from (17). 

6. An Experiment. We will now describe an exercise in experi¬ 
mental sampling which will help make the theory more meaningful. 
It was performed by a class of thirty students who took the distri¬ 
bution of Table 7 as a “ universe.” 

In a box were placed 2000 discs t each bearing a number from the 
set 1, 2, 3, • • •, 25. The numbers on the discs were coded to the 

* The problem of estimation is discussed in the next chapter. 

t Small metal rimmed price tags were used. Ideally, each individual disc 
should be returned to the box before the next is drawn. However, this was not 
insisted upon and an entire sample may have been drawn before replace¬ 
ment. 



106 


Mathematics of Statistics 


Table 7. Span among Adult Males. (See Table 20, Part I) 


* 

/ 

58.5 

1 

59.5 

2 

60.5 

1 

61.5 

6 

62.5 

7 

63.5 

22 

64.5 

55 

65.5 

111 

66.5 

146 

67.5 

182 

68.5 

229 

69.5 

265 

70.5 

263 

71.5 

217 

72.5 

176 

73.5 

132 

74.5 

82 

75.5 

48 

76.5 

20 

77.5 

16 

78.5 

12 

79.5 

3 

80.5 

1 

81.5 

2 

82.5 

1 


span values in accordance with the scheme shown on page 107, and 
the frequency of the variously numbered discs equaled the fre¬ 
quency of the corresponding x’s. Each member of the class drew 
samples from the box according to the following directions. 

Directions 

1. Intermix the discs thoroughly and withdraw four random 
samples of ten discs each. 

2. Record the numbers in each sample of ten on the sampling record 
sheet (page 107); replace the discs in the box. 

3. For each sample of ten: find (a) mean span, (6) variance, (c) 
standard deviation. 

4. Combine the four samples into a single sample of forty and 
find the statistics named in 3. 



Fundamentals of Sampling Theory 


107 


Sampling Record Sheet 


Span 

Number 

First 

Second 

Third 

Fourth 

Total 

on Disc 

Sample 

Sample 

Sample 

Sample 

58.5 

1 






59.5 

2 






60.5 

3 






61.5 

4 






62.5 

5 






63.5 

6 






64.5 

7 






65.5 

8 






66.5 

9 






67.5 

10 






68.5 

11 






69.5 

12 






70.5 

13 






71.5 

14 






72.5 

15 






73.5 

16 






74.5 

17 






75.5 

18 






76.5 

19 






77.5 

20 






78.5 

21 






79.5 

22 






80.5 

23 






81.5 

24 






82.5 

25 






Mean* 







Standard Deviation 







* In computing the statistics let x denote span and u the number on a disc. Then u = * 57.5, 

5 = S *f 57.5, and b fin* 


The results of 3(a) will be reproduced here. There were, of 
course, 120 means from samples of iV = 10. These were then 
grouped into a frequency distribution. The resulting distribution 
and its moments, together with the moments of the universe, are 
given in Table 8. (The computations were made according to the 
definitions given in Part I for the moments of an observed distri¬ 
bution.) 

Although the chief purpose of the experiment is an appreciation 
of the theory, it is of interest to compare the experimental and 



108 


Mathematics of Statistics 


Table 8. Distribution of the Means of 120 Samples of N = 10 Drawn 
from the Universe of Span 


Interval 

Mid x 

Frequency 

Moments 

67.0-67.3 

67.15 

1 

Mean x = 69. 785 

67.4-67.7 

67.55 

1 

si = 0.8941 

67.8-68.1 

67.95 

4 

68.2-68.5 

68.35 

4 

= 0.052 

68.6-68.9 

68.75 

5 

«4;i = 3.030 

69.0-69.3 

69.15 

19 

69.4-69.7 

69.55 

27 


69.8-70.1 

69.95 

20 


70.2-70.5 

70.35 

20 

x = 69.943 

70.6-70.9 

70.75 

7 

(j x = 3.115 

71.0-71.3 

71.15 

6 


71.4-71.7 

71.55 

3 

a 3:x = 0.161 

71.8-72.1 

71.95 

3 

a i:x = 3.296 


theoretical results. According to Theorem I the mean should be 
69.943; we obtained 69.785. According to Theorem II the stand¬ 
ard deviation should be 3.115/(10) 1/2 = .985; we obtained .894. 
It is left as an exercise for the student to verify that the approxi¬ 
mations of the a ’s are also close. 

We may think of this “ universe ” as approximating a Type III 
curve and the distribution of Table 8 as approximating its sampling 



Fig. 18. Depicting the Sampling Distribution of Means 
from a Type III Universe 

curve of means (Figure 18). To represent graphically a universe 
and the curve of sample means from that universe would require 
analytical expressions for both these distributions. As yet, neither 
a type of universe has been specified nor has the functional form 
of the curve of means from that universe been determined. How- 



Fundamentals of Sampling Theory 


109 


ever, Figure 18 will help the student appreciate the meaning of some 
of the moment relations developed in § 5. 

7. Reproductive Property of Normal Law. An important problem 
is to find the distribution function of the sum of several independent 
variables when these variables are normally distributed. It suffices 
to show how this problem can be solved for the sum of two such 
variables. The following discussion follows closely a proof given 
by Jackson. 6 

Let x and y be independent variables and normally distributed 
about zero as mean with standard deviations <n and <r 2 , respectively. 
Their distribution functions will have the forms 


g(x) = Cie-*, h(y) = C^\ a = —r 


1 _ h _ 1 . 
(2<ri 2 ) ’ (2<r2 2 ) ’ 

: <r 2 (2x) 1/2 , for total fre- 


the explicit values 1/Ci = <ri(2ir) 1/2 , 1/C- 
quency 1, are not needed at the moment. 

If f(x, y) is the joint distribution function for x and y with marginal 
distributions g(x ) and h(y) we shall first show that the frequency 
function, H(w), for the variable w = x + y is 


H(w) = f" 


f(x,w — x) dx. 


For a < w < /3, when a - x <y < P - x; these inequalities define 
a strip of the (x, y)- plane for which the corresponding frequency is 


F(a, 0) 


-XX 


f(x, y) dy dx; 


in the integration with respect to y, the substitution w = x + y, 
y = w — x, makes 


r —X pP 

f(x,y)dy=j f(x, 


- x) dw, 


F(a, 0) = f(x, w — x) dw dx 

= j* j* f(x, w — x) dx dw 
= J* & H (w) dw. 


and hence 



110 


Mathematics of Statistics 


We can now proceed with the main part of the proof. Since x 
and y are independent, their joint distribution may be written 

f&> V) = g{x)h(y) 

and so we have 

H(w) f(z, w — x) dx 

= C lC ,f e -aa?-b(u>-x)2 fa. 

To evaluate this integral write the exponential expression in the form 

ax 2 + b(w — x) 2 = (a + b) j x -1 -\ - w 2 

l a + b\ a + b 

= (a + b)z 2 -f- cw 2 , 

where 

9 = ^ _ bw _ ab 1 

a + b’ ° ~ a + b ~ ' 

The value of w being regarded as constant for the integration with 
respect to x, so that incidentally dz = dx, the expression for H(w) 
can be written in the form 

H(w) = C l C 2 e~ cw ‘ i J' e ~ (a+b)z2 dz 

= Ke~ cu ?, 

where 

K = CiCzJ” dz 

7T V ' 2 

a + fe J 

1 



<r.(2x) 1 ' 2 ’ 

and 

= ^ = (cri 2 + <r 2 2 ). 


If x, y, and u are independent and normally distributed, the 
quantity x + y + u can be regarded as the sum of the two inde- 



Ill 


Fundamentals of Sampling Theory 

pendent normally distributed variables x + y and u, and so is itself 
normally distributed. The conclusion can be carried over by induc¬ 
tion, without further calculation, to the sum of any finite number 
of variables. Hence we have the following theorem. 

Theorem VII. If x h x 2 , • ■ • x N , are independent variables and 
normally distributed with variances <ri 2 , <r 2 2 , ‘ > vn 2 , the function 

w = Y^CiXi is normally distributed unth variance <r w 2 = £ciW. 

1 1 

The essential feature of the theorem is the part relating to the 
form of the distribution. This rather remarkable property of a 
linear function of normally distributed variables is sometimes called 
the reproductive property of the normal distribution. The part of 
the theorem relating to the magnitude of the variance follows neces¬ 
sarily from a general formula which was previously established 
without supposing the variables normally distributed or otherwise 
specialized. 

Corollary. The sampling distribution of means from a normal 
universe is itself normally distributed . The mean of the sampling dis¬ 
tribution is the same as the mean of the universe and its variance is the 
variance of the universe divided by the size of the sample. 

The proof is left to the student. One should not conclude that it 
is generally true that the means of samples of N are distributed 
according to the same type of function which specifies the universe 
from which they are drawn. But the magnitudes of the mean and 
variance, as given in (11) and (12), are general in the sense that they 
are true for the sampling distribution of means from any infinite 
universe. 

8. Non-Normal Universes. From analytic considerations, com¬ 
paratively little is known at present about the exact distributions of 
statistics for samples drawn from non-normal universes. In a re¬ 
cent paper, Rietz 7 has listed the contributions and summarized the 
progress that has been made in this connection. The reader may 
refer to this paper. 

With regard to the mean, Theorem IV tells us that a 3: ; —* 0 and 
a 4;5 —> 3 as N —> °°. So, even though the universe is far from nor¬ 
mal, if the sample is made large enough, the sampling distribution 
of x approaches the normal form as characterized by skewness and 
kurtosis. (The conditions « 3 = 0 and cu = 3 are necessary but not 
sufficient conditions for a normal distribution.) Even for compara¬ 
tively small values of N there is sufficient experimental evidence to 



112 Mathematics of Statistics 


consider the distribution of x as normal to a high degree of approxi¬ 
mation. 

Finite Universes. So far we have assumed that the universe 
was “ infinite/’ that is, that it was indefinitely large in all its classes, 
as compared with the sample. This condition could be satisfied 
with a limited supply, for example in the experiment described in 
§ 6, by replacement after each individual draw. However, if the 
entire sample is drawn from a limited supply before replacement, the 
probability of drawing an individual from a given class will be af¬ 
fected each time that one is drawn from that class. In such a case 
the universe is said to be “ finite.” 

If M is the total frequency of a finite universe, the first four 
moments of the sampling distributions of x are as follows: 


(19) 


E{x) = x <r; 2 = 
(M 


ts-J = : 


M - N 
N(M - 1) ° 
1 ){M - 2 N) „ 


N(M - N)(M - 2) 


(iU-l){(ilf 2 -6MAT+M+6iV 2 )54:,+3.M(lf-jV-l)(jV-l)} 


Their origin is doubtful. 8 They are more general than the formulas 
given in (12) and (14) and reduce to them if M —> oo. 

The conclusion of investigators is that the distribution of means 
from nearly any finite universe is practically normal. In this con¬ 
nection the following striking example is given by Carver. 9 

A group of students chose arbitrarily the following most unusual 
distribution for a parent universe: 


Table 9 


X 

/ 

15 

9 

3 

2 

29 

43 

405 

189 

1710 

37 

Total 

280 




113 


Fundamentals of Sampling Theory 

N 

and found the distribution of = Nx of 1000 samples of twenty- 
1 

five variates each shown in Table 10. It was obtained as follows. 


Table 10 


Class 

/ 

5,000- 

2 

7,000- 

54 

9,000- 

203 

11,000- 

310 

13,GOO- 

254 

15,GOO- 

130 

17,000- 

36 

19, GOO- 

9 

21, 000- 

2 

Total 

1000 


Two hundred and eighty Hollerith cards were punched with numbers 
corresponding to the two hundred and eighty variates of the parent 
population. The cards were thoroughly shufHed and then placed 
in a tabulating machine. After twenty-five cards had run through 
the electric tabulator their total was recorded. By repeating this 
procedure one thousand samples were readily obtained. It is thus 
possible to obtain experimentally some appreciation of the sensi¬ 
tivity of the sampling distribution of means to changes in population 
form. Carver concludes that if the sample N is fifty or larger and 
the population is at least ten times N , the parent population has 
relatively little control over the shape of the distribution of x. 

Another set of experiments was conducted by Shewhart 10 who 
comes to the following conclusion: 

Such evidence, supported by more rigorous analytical methods beyond the 
scope of the present discussion, leads us to believe that in almost all cases in 
practice we may establish sampling limits for averages of samples of four or more 
upon the basis of normal law theory. 

9. Tchebycheff’s Inequality. In (1) replace x by w, let 4 >(w) = 
(w — w) 2 , and in the expression for E{(w — w) 2 } replace all values 
of w larger than w + 5<r by w + 8 <r and all values of w less than 
w — 8a by w — 8a where 8 is a positive number. Then 
E{(w — w) 2 } > k 2 + ( 8 <r) 2 P s 
^ 8 2 <r 2 Ps, 


(20) 



114 


Mathematics of Statistics 


where 


X ui+5<r 

(w — w) 2 f(w) dw > 0, 

—Sir 


and P s is the probability that w lies outside the interval (w — 8<r, 
w + 5<r). From (20) we have 


( 21 ) 


Ps^~ 


and therefore the following theorem. 

Theorem VIII. The 'probability is not more than 1/8 2 that a value 
of w taken at random from the universe f{w) will differ from its expected 
value by more than a multiple 8 of its standard deviation. 

This theorem is known variously as TchebychefF’s theorem, 
criterion, or inequality. A striking property is its independence 
of the nature of the distribution of w. But the gain in generality 
must be paid for and the price is inadequate information about the 
particular. That is, the inequality (21) may be too wide to be of 
practical value in passing judgments on sampling fluctuations in a 
known or proposed distribution. Nevertheless, it does have some 
useful applications, two of which will now be given. 

10. Law of Large Numbers. The Bernoulli theorem (Chapter I, 
§ 7) can now be established. Let w = x/s, x being the number of suc¬ 
cesses in s trials. Then w = p. Let Ps be the probability that 
x/s lies outside the interval (p — e, p + e), where e > 0. We may 
take € = 8 (pq/s) 112 , a multiple of the standard deviation of the 
relative frequency x/s. Accordingly, by Theorem VIII we have 


Since 


Ps^: 


1 _ (; pq/sY 

8 € 


we obtain the inequality 


Ps ^ 


P(1 ~ V) 

S€ 2 


For any assigned e, P s can be made arbitrarily small by increasing s. 
Thus x/s becomes increasingly reliable as an estimate of p as s 
increases. 



Fundamentals of Sampling Theory 115 

The inequality of Tchebycheff can also be used to prove the sta¬ 
bility of the means of large samples. Consider a sample of N from 
f(x) in which the variance is a 2 : Let w be a linear function of the 
sample defined by 

Xi + Xi + • • • + ** 

w = - N 

Suppose c 2 is a constant such that <r 2 < c 2 . Since w = x , we have 


Let P be the probability that ( x — x) 2 > h 2 . That is, P is the 
probability that 

xo Nh* c 2 
(x-x) 2 >—-- t 


Therefore, from Theorem VIII, 

c 2 

P ~ Nh 2 ' 

Since c and h are fixed, P can be made arbitrarily small by taking N 
sufficiently large. Hence we have the following theorem. 

Theorem IX. The probability that the mean of a sample of N variates 
wiU differ numerically by more than a given positive number h from the 
mean of the universe can be made arbitrarily small by taking N suffi¬ 
ciently large. 

Under the conditions of the theorem, x is said to converge stochasti¬ 
cally to x. This type of convergence, however, should not be con¬ 
fused with convergence in the sense of analysis. 

11. Probability Scale of Sampling Fluctuations. Now that the 
personae dramatis have been assembled, we can state a theorem 
which tells us what the approximate probability is that the mean of 
a sample will deviate by an assigned amount from a hypothetical 
mean. We are assuming here that <r* is known; the case where <r x 
is unknown will be discussed later. 



116 


Mathematics of Statistics 


We know that x is (or tends to be) normally distributed about x 
with standard deviation a; = oJ^/N. If the distribution of x be 
reduced to standard units by the transformation 

( 22 ) t=L V 1 ' 

Vn 

then we know that t is approximately normally distributed about 
zero with standard deviation of unity. Hence we can refer to a nor¬ 
mal probability scale for the probability that one would obtain a 
random sample for which x differs from x by as much as \s\, where 
5 is expressed in the <r* unit. So we have the fallowing theorem. 

Theorem X. The 'probability Q s that a random sample from an 
infinite universe mil have a mean, x, which will be within an interval 8 
of the mean, x, of the universe is approximately 



where 8 is the observed value of t given by (22) and <f>(t) is the normal 
curve. Then P s = 1 — Q$ is the approximate probability that x will 
not be within |<5| of x. If the universe is norinal, P$ gives the exact 
probability. 


<f>( t) 



Fig. 19. P$= EZZ3 . Qs is the Probability for a Deviation as Small as 
|$|, and P s is the Probability for a Deviation as Large as |s| 

12. Null Hypothesis and Significance Tests. The rationale under¬ 
lying sampling theory has been summarized) by E. S. Pearson 11 
as follows: 



Fundamentals of Sampling Theory 117 

In applying the methods of statistical analysis it is generally our aim to dis¬ 
criminate between two or more alternative hypotheses regarding the factors 
which have controlled certain observed events, which form what we term a 
sample or samples. If the process is examined in a little detail it will be found 
that the procedure may be described as follows: 

( a ) We define a hypothesis to be tested. 

(b) We choose the criterion (or criteria) whose numerical value, derivable 
from the observations, is most suitable for testing the hypothesis. In 
doing this we recognize that the criterion is not a single-valued expression 
even if the hypothesis be true, but will vary from one sample of observa¬ 
tions to another. 

(c) We therefore refer the observed value of the criterion to this sampling 
distribution — e . g ., to a normal probability scale, etc. — and so obtain a 
measure of the likelihood of the hypothesis. 

(d) Finally, if judged on this probability scale the observed criterion is not 
exceptional, we conclude that upon the information available there are no 
grounds for discarding the hypothesis; or if the value prove exceptional 
we consider the possibility of alternative hypotheses. 

An hypothesis which is tested for possible rejection under the 
assumption that it is true has been called by Fisher 12 a null hypothesis. 
In other words, null hypothesis refers to a particular form of popula¬ 
tion distribution which is assumed in considering whether or not a 
sample could reasonably have arisen from the population which, 
in fact, was assumed. If the sample could not reasonably have 
arisen from the population proposed, as measured by a significance 
test, we say that the null hypothesis is refuted for the level of signifi¬ 
cance adopted. If the significance test yields a verdict of “ not 
significant ” for the probability level adopted, we say that the null 
hypothesis is not refuted or contradicted at that level. 

It is open to the investigator to be more or less exacting concerning 
the smallness of the probability he would require before he would be 
willing to admit that his test has demonstrated a significant result. 
Good judgment in these matters comes only from much experience 
in the particular field in which the problem occurs. However, it is 
conventional among certain workers to adopt the following rule: 

If = P$ > .05, 8 is not significant; 
if = P s < .01, 8 is significant; 
if = .05 > P s > .01, 

our conclusions about 8 are doubtful and we cannot say with much 
certainty whether the deviation is significant or not until we have addi¬ 
tional information. Other workers prefer a more conservative level 
of significance. 



118 


Mathematics of Statistics 


Example 1. Suppose the mean span of 100 persons is found to be 2 = 70.56 
inches. Does this differ significantly from the mean 2 = 69.943 of the “uni¬ 
verse ” with standard deviation <r x = 3.115? Calculating the above test we find 
70.56 - 69.943 , 

8 =- ,— = 1.99. Referring to the normal probability scale we fin d 

3.115/V100 

the chance of a difference between the observed and hypothetical means as large 
as that noted to be P$ — .0471. Our conclusion is that the given statistic 
x — 70.56 is not exceptional, although it is possible that it came from a different 
universe, that is, in this case a different race of men. 

Example 2. Twelve dice were thrown 26,306 times (Weldon's data), and a 
throw of 5 or 6 points was reckoned a success. The mean of the observed dis¬ 
tribution was found to be 4.0524. In tossing a true die the chance of scoring 5 or 
6 is | so the number of dice scoring 5 or 6 should be distributed with frequencies 
proportional to the terms in the expansion (f + $)“. Therefore, the expected 
mean, on the hypothesis that the dice were true, is sp = 12(!) = 4. Test this 
hypothesis using the difference between the observed and theoretical means as 
a criterion of judgment. 

Solution. a x = (■*pq ) 1/2 - {(12) (i) (f)} 1/2 = 1.633 

N = 26,306, 


N 1/2 

8 


-- . 010 , 
.0524 
.010 


The probability that a deviation outside 5 = ±5 would happen by chance is 
extremely small so we conclude that the dice were biased. 

13. Size of Sample to Have a Given Reliability. From Theorem X 
we may determine the size N of a sample sueh that its mean, x, will 
not differ from x by more than a specified error |<$|, with a degree of 
certainty equal to a specified probability. 


Example 3. The American Rolling Mill Company investigated 13 the life of 
ferrous materials under different corrosive conditions. Data obtained from a 
certain kind of sheet material immersed in Washington tap water showed that 
the average time of failure of such sample was 874.89 days and the standard 
deviation of the time of failure was 85.31 days. There arose the following 
question of practical interest to the research engineer of this company: What 
sample size N must be used in order that for similar test conditions, the prob¬ 
ability shall be 0.90 that the average time for failure determined from the N 
tests will be in error by not more than 5 per cent of the average of the universe? 

Ass uming that 874.89 = t and that means of samples of N are distributed 
normally, we may answer this question as follows: The allowable error is 5 per 
cent of 874.89 days or 43.74 days, and this must correspond to a probability of 
0.90. From Theorem X we have 



Fundamentals of Sampling Theory 


119 


whence from the tables we find 8 — 1.645. Henc 
equation 

1.645-^= - 43.74, 


Hence N is found by solving the 


where <r x = 85.31. We find N = 10. 

14. Difference in Proportions. In the analysis of data obtained 
by sampling, certain problems occur which relate to the significance 
of apparent differences in proportions. Suppose we have two random 
samples of size n x and n 2 , respectively, with x x individuals of the 
m items and z 2 of the n 2 items which have a certain character or 
attribute. The question arises as to whether the observed difference 
is merely an accident of sampling or whether a similar difference 
exists in the universe. The following theorem may be used to test 
the null hypothesis that x x /n x and x 2 /n 2 are random and independent 
samples from the same universe. 

Theorem XI. If x x /n x and x 2 /n 2 are random and independent 
samples from an infinite universe in which p is the proportion of indi¬ 
viduals which have the character in question , the probability that the 
difference in the proportions obtained will be numerically as great as the 
observed difference w = \x x /n x — x 2 /n 2 | is approximately Ps , where 
Ps is defined in Theorem X, and 


,(i+■!■)'r 

\n x » 2 /J 


Proof. According to the Bernoulli theory, x x /n x will vary about 
an expected value p with variance pq/n x , where q = 1 — p. Simi¬ 
larly, x 2 /n 2 will vary about p with variance pq/n 2 . Then 

E W = E &- p )- E (nr P ) = 0 ’ 


and from (10), 

(23) ^-S+ES. 

v ' n x w 2 

Therefore, w varies about zero with variance given by (23), and the 
ratio 


varies about zero with unit standard deviation. 



120 


Mathematics of Statistics 


Information about the form of the t distribution may be obtained 
from its higher moments. It is not difficult to show that 


(25) 


1 ~ (tti - n 2 ) 2 

pq rcin 2 (ni + n 2 ) 


OLi — 3 + - 


' x nr 


■ nrtii + n 2 2 


nin 2 (ni + nf) 


For fixed values of p and q, it is clear that a 3 —> 0 and a 4 —> 3 as the 
samples are taken indefinitely large. Even for moderately small 
samples the distribution of t does not differ greatly from the normal 
form. The following empirical rule, suggested by E. S. Pearson, is 
useful when one is in doubt about the propriety of referring (24) 
to the normal probability scale. 

Rule. Suppose n\ < n 2 {we are at liberty to call either nf). If 
nip > 5, the use of the normal probability scale is justified. If 
nip < 5, examine a 3 2 . If a 3 2 < .04, it is still sufficiently accurate. 
But if a 3 2 > .04, no great confidence can be placed in the test. 

In order to apply Theorem XI an estimate of p is usually required. 
For this purpose 


(26) 


Xi + Xj 
Wl + w 2 


is usually taken as the best estimate of p which is available from 
the samples. It is easy to show that E{p) — p. 

References 

1. H. L. Rietz, Mathematical Statistics, p. 114. Open Court. 1927. 

2. R. A. Fisher, Foundations of Theoretical Statistics. Phil. Trans. Royal Soc., 

vol. 222, (A), p. 309. 

3. R. A. Fisher, Statistical Methods for Research Workers, § 11. Oliver and Boyd. 

4. J. V. Uspensky, Introduction to Mathematical Probability. McGraw-Hill. 

1937. 

5. W. A. Shewhart, Economic Control of Quality of Manufactured Product, pp. 

235-237. D. Van Nostrand. 1931. 

6. Dunham Jackson, The Theory of Small Samples. American Mathematical 

Monthly, vol. 42, No. 6, pp. 344-364. 

7. H. L. Rietz, Some Topics in Sampling Theory. Bulletin of the American 

Mathematical Society, April, 1937. 

8. See Church, Biometrika, vol. 18, (1926), p. 357. 

9. H. C. Carver, Fundamentals of the Theory of Sampling. Annals of Mathe¬ 

matical Statistics, vol. 1, No. 1, pp. 111-112. 



Fundamentals of Sampling Theory 


121 


10. Loc.cit., p. 183. 

11. E. S. Pearson, The Test of Significance for the Correlation Coefficient. Journal 

of the American Statistical Association, vol. 26, (1931), p. 128. 

12. R. A. Fisher, The Design of Experiments. Oliver and Boyd. 1937. 

13. See Shewhart, loc. cit., p. 391. 

14. B. H. Camp, Mathematical Statistics, p. 251, D. C. Heath and Co. 1931. 

Problems 

1. Suppose a variable w is normally distributed and a value is selected at ran¬ 

dom. Show that the odds are about 369 to 1 against the value differing 
from E(w) by more than 3 <r w ’ s. 

2 . (a) Consider a finite universe of 5 variates: x lf x 2 , x 3 , x t , x 6 . The number 

of distinct samples of 3 variates each that may be drawn is <7(5, 3) = 10. 
Write these down. 

(6) Let Xi represent the ith sample mean and write down the 10 distinct 
sample means. For example, 

Xi + x% + x s 


(c) Show that the mean of the 10 values of Xi is the mean of the 5 values of 
Xi. Thus, 

What formula does this example illustrate? 

3. Show that the expected value of w * is greater than the square of the expected 

value of w. 

4 . From a box containing 2000 discs representing the distribution of span, 

draw a sample of 25 and compute its mean and standard deviation. Test 
the significance between your mean and the mean of the universe 5 = 
69.943 inches. 

5 . Suppose the weights of a sample of 1000 men of the same age are obtained 

yielding x — 140 lbs. Assuming that <r x = 20.0 lbs., what is the standard 
error of the mean of this sample? What is the probability that this mean 
does not differ from the mean of the universe at this age by more than 
five pounds? 

6. (Camp 1 *) The mean age of death of men who are alive at age 20 is, in the 

United States, 59.13. For the city of Chicago it is 58.98, and in 1910 the 
male population of age 20 was 24,000. Can the difference between the 
United States and Chicago be explained on the hypothesis of chance? 
Assume <r x = 10 years, and that the distribution of the universe is ap¬ 
proximately normal. 

7. (Camp 1 *) A fraternal organization wishes to be very sure that the average 

age of death in its group of men now aged 20 will not differ from the ex¬ 
pected 59.13 years by more than one year. By “ very sure ” it means that 
Q s must equal .999 or more. How large should the group be? (Assume 
as before that a x = 10.) 



122 


Mathematics of Statistics 


8. Given that 

w = 23 c U + Xi). 

k 

If the x’s are independent and ]£/»• is a constant, show that 
l 

k 

< r w 2 — 

where <r< 2 represents the variance of Xi. 

9. Find the mean value of all positive ordinates of the first quadrant of 

x 2 + y 2 = r 2 , 

(a) when equally spaced along the x-axis, 

(b) when equally spaced along the circle. 

Answers: 



10 . Find the mean value of all the ordinates of the curve y — a + b x from 0 to x, 

when equally spaced along the x-axis. 

, E(w r ) 

11 . Derive (25). Hint. S r = E(t ) = - —-y* 

Ww) 

12. Show that the moment relations in (21) reduce to the corresponding rela¬ 

tions in (12) and (14) if M —* oo. 

13 . Suppose 300 mice having cancer of about the same degree of malignancy 

were divided at random into two groups of «i = 100 and n 2 = 200, re¬ 
spectively. The first group was given a certain serum treatment which 
was withheld from the second group but otherwise the two groups were 
treated alike. Among the serum treated there were xi = 8 deaths, and 
among the other group there were x 2 = 25 deaths. Test the significance 
of the difference between the mortality of 8% and 12|% in the two groups. 

14 . An instructor had two classes of 20 and 30 students in the same subject. 

Four in the smaller class and 8 in the larger made grades of B or better. 
Should one seek a further explanation of this difference beyond variation 
due to sampling? 



CHAPTER VII 


SMALL OR EXACT SAMPLING THEORY 


1. Introduction. A theory of sampling which assumes that N is 
large is inadequate for many practical problems. In recent years 
a theory has been developed to give more exact methods in dealing 
with small samples. In the practical field, the call for the solution 
of problems based on comparatively few observations was first 
realized in 1908 by a young man, then unknown, who chose to 
publish his results under the now celebrated pseudonym of “ Student.” 
Since then, many important contributions have been made toward 
the development and extension of this theory. Its applications are 
widespread. In the opinion of the present writer, continuity between 
large and small sample theory is an essential part of the newer atti¬ 
tude. In general, the methods of the theory of small sample theory 
are applicable to large samples, although the reverse is not true. 
It is our purpose in this chapter to facilitate an appreciation of 
some of the simpler aspects of this theory. The treatment centers 
around significance tests for means variances, and correlation co¬ 
efficients. 

2. Expected Value of s 2 . By definition, the variance of a sample 
is given by 


( 1 ) 


Xi 2 + X 2 2 + • • • + Xn 2 

N 


- X 2 . 


Then the expected value of s 2 from repeated samples is 

E(s 2 ) = (zi 2 + * 2 2 + • • • + x N 2 ) j - E(x 2 ). 

Since the x’s constitute a sample we may write 

E(xS + x 2 2 + • • • + x N 2 ) = NE(x 2 ), 
and from (16) of Chapter VI, replacing y by x there, we have 



124 


Mathematics of Statistics 


Therefore, 

£(s 2 ) = ^ {NEW | - i {EW + (AT - 1)**) 

N — 1 

- 1 Y~ - *»(.' 

Hence 

(2) £(**) = 


where <r 2 is the variance of z. 

We may also obtain (2) as follows: Consider independent samples 
each containing N variates u h u 2 , • • • , u N , where = Xi — x. For 
any sample, 


= 1^ 
N^ 


• Vi 'n 


If 1* 


j jv i v 2 ^ 


since the square of a sum is equal to the sum of the squares plus twice 
the cross-products. Then 

E( S *)=±E {£>,«) -itfllV! - 

By Proposition III of Chapter VI the right-hand member of the 
above expression may be written 


which becomes 


No* No* 2 .. 

JV V 2 V 2 ? { 


Since E(uiUj) — 0, by Proposition V, we have the final result 


E(s 2 ) 


V - 1 
V 


cr 2 . 


This result is sometimes stated as in the following theorem. 




Small or Exact Sampling Theory 125 

Theorem I. The mean of the sampling distribution of s 2 from an 
arbitrary universe equals the variance of the universe multiplied by the 
factor * (N — 1 )/N. 

It is to be anticipated that the expected value of s 2 is less than <r 2 , 
as the following analysis will show. The variance <r 2 refers to devia¬ 
tions from x, whereas any s 2 refers to deviations from an x. For any 
sample, then, we may regard x as an arbitrary origin. Since in the 
case of any sample, the sum of the squares of deviations from its 
mean, x, is less than the sum of the squares of deviations of the same 
variates from an arbitrary point x (unless the sample is one whose 
mean falls at x), it is to be expected that the mean of all the values 
of s 2 will be less than <r 2 . Relation (2) measures the extent of this 
inequality. 

3. Unbiased Estimates of Population Parameters. A distribution 
function is not only a function of the variable involved, but it is also 
a function of the parameters, or hypothetical quantities, which are 
introduced to specify the universe sampled. In the case of a 
Bernoulli distribution the parameter is p, in the Poisson law 
it is m, and in a normal distribution there are two parameters, 
x and <r. 

A function of the variates given by a sample for estimating a 
parameter is called a statistic. Let d be a statistic corresponding to 
a parameter 6 in the universe. We now state the following 

Definition. If the expected value of $, E0), equals 6 then $ is 
called an unbiased estimate of d. 

It is clear from Theorem I of Chapter VI that the mean of a sample 
is an unbiased estimate of the mean of the universe. Also from (26) 
of Chapter VI we see that p defined there is an unbiased estimate 
of p. 

Before the relation off = o x 2 /N can be of much use to us in the 
applications we must have an estimate of c* 2 from the sample or 
samples available. By Proposition I of the preceding chapter, 



N 

N - 1 


E(s 2 ) 


<r 2 by (2). 


* This factor is sometimes called “ Bessel’s correction.” Perhaps it should 
be attributed more appropriately to Gauss who made use of it, in this connec¬ 
tion, as early as 1823. 


126 


Mathematics of Statistics 


Let a 2 be an unbiased estimate of a 2 . If this estimate 1 is based on 
a single sample we have 


(3) 



N 

Z)(*< - X ) 2 
_1 __ 

N-l 


If n = N — 1 it is obvious that 


(3a) 

It is conventional 2 to take 


n 

n + 1 


O’ 2 . 



(4) 


[ N Y' 2 

U- lj 


5 


as an estimate of <r. If N is large the difference between unity and 
the coefficient of s in (4) is negligible in numerical problems. With 
N large it would not be invalid, to any appreciable extent, to use s 
as an estimate of a. 

If two independent samples are available from the same universe, 
an unbiased estimate based on the two samples is given by 


where 

q = JVi5i 2 + N 2 sJ, N = Ni + N 2> 

si 2 and s 2 2 being the variances of samples consisting of Ni and N 2 
variates, respectively. It is left as an exercise for the student to 
verify that the expected value of q/{N — 2) is <r 2 . 

In case k independent samples are available from the same universe, 
we may generalize (5) and write 


( 6 ) 

where 


a 2 = 


Q 

U=k' 


Q = NiSi 2 -f- N 2 s 2 2 + • • • + NkSk 2 t 
U = Ni + N 2 + • • • + Nk, 

and Si 2 is the variance in the tth sample consisting of Ni variates. 



127 


Small or Exact Sampling Theory 

When a 2 is used in future discussions it will be clear from the context 
whether this estimate is based on 1, 2, or k samples. 

If JVt = N is rhe same for every sample, (6) reduces to 

N(s\ 2 + s 2 2 + «3 2 + • • • + s * 2 ) 

(7) * 2 =-’ 

where U = Nk. Clearly, (7) may be written in the form 

(7a) ^ — a 2 = + s 2 2 + S 3 2 + * * • + s * 2 )- 

When k is taken infinitely large so that U becomes the universe, the 
right member of (7a) then refers to the expected value of s 2 , and a 2 
becomes <r 2 itself. So as k -* « the limiting value of (7a) becomes 

^Lz 1„2 = e ^), 

N 

as given in (2). . 

As an alternate to (7), in the case where all samples contam the 
same number of variates, we may take 

(8) * = m) x ^ (s ‘ +S!+s!+ -" +si) 

= —i— x mean value of standard deviations, 

b(N) 

where b(N) is a function of N and approaches unity as N increases. 
The exact expression for b(N) will be derived in § 7. Its approxi¬ 
mate value is b(N) = 1 - 3/(4 N). Ask-* <*> the limiting value of 
(8) becomes 



In § 7 we will prove that b(N)<r is the mean of the sampling distri¬ 
bution of s from a normal universe whose standard deviation is a. 
Values of b(N) and its reciprocal have been tabulated by E. S. 
Pearson 3 and others, 4 and we have included a short table in § 7. 

As an alternate to (4) we have from (8) when k = 1, 



128 


Mathematics of Statistics 


4. Degrees of Freedom. In § 2 we have proved, essentially, that 
the expected value of 2(x< ~ z) 2 is (N - IV, where the N values 
of x in the sample are subject to the linear restriction = Nx. 
This is equivalent to proving that the expected value of ]T Xi 2 is 
{N — 1)<7 2 when the x’s are subject to the linear restriction = 0. 
Suppose, however, that there are k < N linear restrictions on the x’s. 
What, then, is the expected value of J^Ct 2 ? A. T. Craig 6 has proved 
analytically that if x h x 2 , • ■ • , x N , are N independent values of a 
variable which is normally distributed about zero with variance <r 2 
and if the N values of x are subject to k < N homogeneous linear 
restrictions, then the expected value of X Xi 2 is (N — k)<r 2 . The num¬ 
ber n = N — k is frequently called the number of degrees of freedom. 

5. “ Student’s ” Distribution. The formula used in testing a null 
hypothesis that a given sample comes from a universe with a pro¬ 
posed mean is 

Oil) ' 


As stated in Chapter VI, (11) is normally distributed if the universe 
is normal. On the side of applications, cr is seldom available and 
usually must be estimated from the data available. If we substitute 
into (11) the estimate of a given in (4) and calculate 


( 12 ) 


(3c - %){N - 1) 1/2 

- » 

5 


we are not justified in asserting that (12) is normally distributed 
unless N is large. And so, in testing the significance of the mean 
of a small sample we are not justified in referring (12) to a normal 
probability scale. The variability of s from sample to sample 
invalidates that procedure. 

While Helmert obtained the distribution of s 2 as early as 1876 
it seems that “ Student ” 6 was the first to recognize the importance, 
for the theory of small samples, of taking account of the variability 
of s in (12). By means of a remarkable intuition he obtained, some¬ 
what empirically, the joint distribution function of x and s from 
a normal universe. Later writers, notably Fisher, established his 
results rigorously. 

“ Student ” actually found 
variable, viz., 

(13) « 


the distribution of a slightly different 
x — x 




Small or Exact Sampling Theory 129 

Obviously, z is functionally related to t by 

(14) z = t(N- 

so the distribution of t can easily be obtained from that of z. In 
deriving the distribution of z we shall follow the proof given by 
Fisher. 7 To avoid interrupting the main development some of the 
details will be deferred to the next section. 

Consider a normal universe with frequency element 

df = (2 7r<r 2)- 1 / 2 e- ( *- 2)2/2 ^ dx. 

Let a sample (x h z 2 , • • • , x N ) be taken at random from it. Then the 
probability that the sample will lie in the element of volume 

dv = dxi dx 2 • • • dxN 

is 

( 15 ) dF = (2ir<T 2 )~ NI2 e- vW dv, 

N 

where V 2 = ]£(xi ~ 5) 2 . From the relation* m — vi 2 we have 
V 2 _ jvs 2 + N(x - x) 2 . 

Hence (15) may be written 

( 16 ) dF = (27ro- 2 ) ~ NI 2 e _t N « 2 + N )2 ) /2 * 2 dv. 

By means of iV-dimensional geometry (to be explained in § 6) Fisher 
showed that the element of volume dv can be expressed in terms of 
the variation of x, namely, dx, and the variation in volume, d^- 1 ), 
of an (N - l)-dimensional hypersphere of radius (N) ll2 s, so that 

( 17 ) dv = Cs N ~ 2 dx ds, 
where C is a constant. Then (16) becomes 

( 18 ) dF = ke-l N * +N(3 -* )i d 2, ' i s N - 2 ds dx. 

From (18) the distribution of z can be deduced. From (13) we 
obtain dx = s dz for a fixed value of s. Substituting in (18) we 
obtain, for the joint distribution of s and z, 

( 19 ) ds dz. 

This expression is defined for s ^ 0, being identically zero for s < 0 
since s is taken as the positive square root of s 2 . If s is integrated 

* Cf. Part I, Ch. IV, § 9. 



130 


Mathematics of Statistics 

out of (19), the distribution of the single variable 2 is obtained. To 
perform this integration, let 

V = s(l + s 2 ) 1/2 , ds = (1 + 2 2 )-i /2 dy , 

and integrating with respect to y from 0 to 00 , we have 

k { fo y N ~ le ~ NyW ^|(1 + z 2 )~ N ' 2 dz 
which reduces to 

#(1 + 2 2 )-"/2 dz 

where, as shown in § 4 of Chapter II, 



Therefore, the distribution function for “ Student’s ” z is 



The curve is symmetrical with mean zero and infinite range. It is 
quite different, however, in mathematical character from the normal 
curve although it approaches this form as N —> 00 . (Cf. § 9.) 
From the viewpoint of sampling theory the important property of 
(21) is its independence of <r. The revolutionary character of this 
property is revealed in certain applications that involve drawing 
probable inferences from small samples, say from a sample of N — 10. 

Using (14) Fisher modified (21) and obtained the distribution 
of t which is the one now widely applied. Before discussing the 
^-distribution, we shall give the details of Fisher’s derivation of (21) 
and consider the separate distributions of x, s 2 , and s. 

6. Fisher’s Derivation. Making use of the geometrical method 
employed by Fisher 7 we shall imagine an iV-dimensional space in 
which we take the origin at the point 0(x, x, • • • , x) and rectangular 
axes Oui, Ou 2 , • • • , Ou N . A point can be located in a space of a 



Small or Exact Sampling Theory 131 

specified number of dimensions by associating with the point a set 
of numbers. Therefore, we may represent the sample by the point 
P(wi, m 2 , • • •, u N ) where = x t - x. Although it is impossible to 
visualize a space of N dimensions for N > 3 we will carry through the 
argument for the general case by analogy with the case for N = 3. 
So we consider the latter case first. 

When N = 3, the sample is represented by the point P(ui, m 2 , u 3 ) 
and we have the mean u and variance ,s 2 defined by 

(а) «i + u % + m 3 = 3m 
and 

(б) (Ml - m) 2 + (m 2 - m) 2 + (w 3 - u) 2 = 3s 2 . 



For an assigned u, (a) represents a plane; and, for an assigned pair 
of values of (m, s), (6) represents a sphere with center at the point 
M(u, m, m). The line 
(c) Mi = m 2 = m 3 

has direction cosines each equal to 1/(3) 1/2 and is normal to the 
plane (a). The perpendicular distance of P from this line is 

MP = s(3) 1/2 

as can be seen from (6). We require the probability, to within 
infinitesimals of order higher than du ds, of getting a sample of 





132 


Mathematics of Statistics 

N = 3 independent values of u which will simultaneously yield 
values of u and s which lie within the region bounded by u, u + du 
and s, s + ds. Following the method of § 5, an element of this 
probability density is given by the expressions 

dF = (27r<r 2 )- 3/2 e-( M i 2 +«2 2 +“3 2 )/2<r2 dv 
= (2ir<r 2 ) -3/ 2 e -3 <* 2+52 ) /2<r 2 dv 

where 

dv = dui duz duz. 

As the sample point P(u u w 2 , u 3 ) varies, u and s also vary. Cor¬ 
responding to different values of s there are a set of concentric spheres 
defined by (6) all having the same center. Since the plane (a) 
passes through the common center of the spheres, the region dv is 
a shell between concentric spheres of radii Vss and Vs(s + ds). 
To use a homely illustration, dv corresponds to one of the successive 
layers in an onion. Our problem is to express dv in terms of u, s, 
du, and ds. Now the line (c) meets the plane (a) at M and the 
distance OM is 

OM = w(3) 1/2 

d (OM)* 80 we have the differential element 
d(OM) = (3) 1/2 du. 

Since the plane (a) passes through M, the intersection of the plane 
and sphere is a great circle with center at M and radius equal to 
s(3) 1/2 . The area of this circle is 

A = Sts 2 

and the differential element dA is 

dA = 67rs ds. 

Therefore, within infinitesimals of higher order, 
dv = dA d(OM) 

= CiS ds du 

where here and hereafter, in this section, the C’s are constants. So, 
the required probability is 

dF = C' 2 e“ 3( * 2+ “ 2)/2<72 s ds du. 

Passing now to the general case involving A-space, let P be the 





133 


Small or Exact Sampling Theory 


point representing the sample (iti, Ui, • • • , un). Then PM is the 

perpendicular from P upon the line 

(d) Ui = Ui = • • • = un 


and we have 

OM = (N) ll2 u, OP 2 = I> 2 , 
MP 2 = OP 2 - OM 2 = - Nu 2 = Ns 2 . 


In N-space, the plane (a) generalizes into the hyperplane 
(e) = Nu, 

and the sphere (6) generalizes into the hypersphere 
(/) “ ^) 2 = Ns * 

with radius MP = ( N) 1/2 s and center at (u, u, ■ • ■ , u). Now, the 
hyperplane ( e) will intersect the hypersphere (/) in an (N — 1)- 
dimensional hypersphere to correspond to the circle for the case 
N = 3. Consequently, for a given pair of values of u and s, the 
point P will lie on an (N — l)-dimensional hypersphere orthogonal to 
the line OM. The volume of this (N - l)-hypersphere is given by 


and so 


A = C 3 (VNs) n ~ 1 
dA = CiS N ~ 2 ds. 


Therefore, the volume dv = du\ du^ • • • duN between two concentric 
spheres of radius VNs and ViV(s + ds) is approximately 
dv = dA d(OM) 

= Cs N ~ 2 ds dn. 


Since du, = dxt and du = dx, (17) is established. 

7. Distributions of x, s 2 , and s. Taken Singly. It is clear that 
(18) may be written as follows: 

(22) dF = ^ e -AT { « 2 +(*— £ ) 2 )/ 2 <t 2 ( s 2 ) ( ^-8 )/ 2 d ( s 2) d % 

= k\e~ N dx X kie-^i^is 2 )^-^' 2 d(s 2 ). 

From this factored form it follows that 

(a) The law of distribution, G(x), of sample means from a normal 
universe is given by 

(23) 


go c) = ( *- 2 > 2/ y 


134 Mathematics of Statistics 


it being fairly obvious from the form of G(x) that 



It may also be evaluated by imposing the condition that 
J G{x) dx = 1. 

Evidently, G(x) is a normal distribution with mean equal to x and 
standard deviation equal to <r/(N) 112 , a result already familiar from 
Chapter VI. 

(6) The variance, s 2 , of a sample is distributed according to 
(26) H(s 2 ) = k 2 e- N ^ 2 ( S 2)(v-3)/ 2> 


where (see § 4, Chapter II) 


(26) 



Thus the distribution of the variance was found by first finding the 
simultaneous distribution of the variance and mean. Clearly, 
H(s 2 ) is a Pearson Type III curve with range limited at one end, 
s 2 = 0, and not at the other, s 2 = <x>. 

(c ) The variance, s 2 , and the mean, x, are distributed quite inde¬ 
pendently, that is, 

F(x, s 2 ) = G(x)H(s 2 ). 

It has recently been proved by Geary 8 that a necessary and sufficient 
condition that x and s 2 from samples of N values of x be independent 
in the probability sense is that the x’s be normally distributed in the 
parent universe. 

In § 2, the mean of the sampling distribution of s 2 from an arbitrary 
universe was obtained. It is interesting to verify that result in the 
present case where we know the distribution function. The mean 
of the distribution of variances of samples of N from a normal 
universe is given by 

E(s 2 ) = f H(s 2 )s 2 d(s 2 ), 

Jo 




Small or Exact Sampling Theory 


135 


where His 1 ) is defined in (25). So we have 

E(« 2 ) = h, 

Jo 

/2a‘\ N +"i\JN + 1\ 

~ h \Nj \ 2 ) 



The standard deviation of the H(js 2 ) distribution is, approximately, 



The distribution of the standard deviations of samples of N from 
a normal universe is readily found from (25) and (22) to be 

(27) h(s) - 2k 2 e~ N,il2ai s N - 2 . 

So its mean value is given by 

-r h(s)s ds 


which yields the result 

«•>-*•(¥)“ (!) 

Upon substituting the value of k 2 given in (26), the above expression 
becomes 



If we denote this coefficient of <r by b(N) we have the relation 

_ m 

* b(N) 

which was promised in § 3. Romanovsky 9 showed that 


3 


7 




136 


Mathematics of Statistics 


Table 11 gives values of the reciprocal 
of b(N) for a few values of N. 

Romanovsky also deduced the 
standard deviation of the h(s) dis¬ 
tribution to be 

/i_2_3_ y/2 

<7 ' \2W 8 N* 16JV 3 ) 

The approximate value 

( 1 V /2 
2JV/ ' 

is frequently used in practice and 
this is the basis for the common 
statement that the standard error of 
a standard deviation is 1/(2) 1/2 that 
of a mean. 

The modal value of s, easily found* by differentiating h(s), is 


Table 11 


N 

1 /b(N) 

2 

1.772 

3 

1.382 

4 

1.253 

5 

1.189 

6 

1.151 

7 

1.126 

8 

1.108 

9 

1.094 

10 

1.084 

20 

1.040 

30 

1.026 

50 

1.015 

100 

1.008 


(30) 



If we make the substitution y = s — 8, then the distribution of y is, 
to a first approximation, the normal curve 


(31) 


Const. X e“^ 2/2<r2 


with standard deviation <r/(2iV) 1/2 . 

8. The (X, s)-Frequency Surface. We may regard F(x, s ) as 
describing a frequency surface if the total volume under the surface 
represents the means and standard deviations of all possible samples 
of size N. In depicting this surface it is convenient to let u — x — x 
so that the origin of u is at x = x. 

Since 


IX 


F(x, s ) dx ds — 1, 


then the volume under the surface over a closed contour in the 
iZs-plane represents the proportion or percentage of samples whose 

* If we make h(s) a maximum for variation in a we find that 

V = N ll2 s/(N - 1) 1/2 ors =<r\(N - 1 )/N} 1 ' 2 (fit. Rider 10 ). 




Small or Exact Sampling Theory 


137 


means and standard deviations fall simultaneously within the ranges 
defined by the boundary of the given contour. In an illuminating 
paper 11 by Deming and Birge two such frequency surfaces are rep¬ 
resented. These are reproduced in Figure 21, one for a small value 
of N and the other for a comparatively large value of N. 



N small, about 10 N larger, about 30 

Fig. 21. The Surface F(x,s) Illustrated by Sections 

As the authors point out, the highest point of the surface has the 
coordinates u = 0, s = <r{(N — 2)/iV} 1/2 . Because of the inde¬ 
pendence of x and s, all plane sections s = constant will be normal 
curves with standard deviations equal to <r/(iV) 1/2 . The u = con¬ 
stant sections will be skew curves whose equations are given by h(s). 
They will all have the same mean and mode. As N increases, their 
mean and mode approach coincidence with the value a while the 
curves lose their skewness and become normal with center at s — a 
and standard deviations equal to <t/(2N) 112 . As N increases, the 
surface becomes more and more concentrated about the point 
u = 0, s = <r. 

9. Fisher’s /-Distribution. Substituting (14) into (21) and re¬ 
placing N — 1 by n we obtain 

( <2\—<»+l)/2 

! + -) 

where 1/K n = n 1/2 B(n/2, 1/2), B being the Beta function. 

Inasmuch as (32) is independent of <r, it can be used in situations 
in which the value of <r is unknown. The quantity t involves no 
hypothetical quantities, being completely expressible in terms of the 
variates. 



138 


Mathematics of Statistics 

In 1925, Student ” published in Metron 12 an extensive table of 
the probability integral J F n (t ) dt. More recently, Fisher 13 has 

given a short table of the probability P of occurrence of deviations 
outside ztt, for values of t and n commonly met in applications of 
small sample theory. Let 

Pn(t) = 2 j*Fn(t) dt. 

Then the probability P tabulated by Fisher is 

(33) P = 1 - P n (t). 

Fisher’s table gives successive columns showing for each value of n, 
from n = 1 to n = 30, the values of t for which P takes the values 
given at the head of the columns. A general idea of the table may be 
obtained from the portion which we have reproduced* in Table 12. 


Table 12. Values op t from Table IV op Fisher’s Text 









n X. 

.9 

.7 

.5 

.1 

.05 

.01 

3 

.137 

.424 

.765 

2.353 

3.182 

5.841 

4 

.134 

.414 

.741 

2.132 

2.776 

4.604 

5 

.132 

.408 

.727 

2.015 

2.571 

4.032 

6 

.131 

.404 

.718 

1.943 

2.447 

3.707 

8 

.130 

.399 

.711 

1.860 

2.306 

3.355 

10 

.129 

.397 

.706 

1.812 

2.228 

3.169 

15 

.128 

.393 

.691 

1.753 

2.131 

2.947 

20 

.127 

.391 

.687 

1.725 

2.086 

2.845 

30 

.127 

.389 

.683 

1.697 

2.042 

2.750 

oo 

.1257 

.3853 

.6745 

1.6449 

1.9600 

2.5758 


The number n, with which to enter the table, is determined by the 
number of degrees of freedom involved in the available estimate (§ 3) 
of <r 2 . In testing null hypotheses the rule given in § 12 of Chapter VI 
may be used, where, of course, P s is to be replaced now by P. 

The distribution of t (as well as that of z) approaches the normal 
type as w —> oo. This may be established as follows. Using Stirling’s 
approximation on the coefficient K n in (32), we obtain, after some 

* With Fisher’s permission and that of his publishers, Oliver and Boyd. 




Small or Exact Sampling Theory 


139 


algebraic simplification, the following expression: 


In - 1 \(»-2>/2/n - l\ 1/2 / n “ IV' 

(-)-<— 2 ) (^Ti) (—) 


K n = e~ m \ 


From this it is easy to show that 
lim K n = 

The rest of the t function may be written as 


KrKT 


which, when n = °o, becomes e <2/2 . Therefore, 
lim F n (t) = (2 , ir) -1/2 e -<2/2 . 


The entries in the last line of Fisher’s table, corresponding to n = oo, 
are the deviations from the mean of a normal curve with unit standard 
deviation. 

According to “ Student,” the distribution of z tends to approach 
a normal curve with a standard deviation of (AT — 3) -1/2 for large 
values of N. Deming and Birge ( loc. cit.) have suggested that the 
distribution tends to approach normality with (N — 3/2)~ 1/2 as 
standard deviation. Anyhow, for large values of N, (N — 3 ) ll2 z 
would be approximately normally distributed about zero with unit 
standard deviation. Since 

K N - 3\1 1/2 , (x - x)(N - 3) 1 ' 2 

jv^“i)| * and i = -i-’ 

it is frequently satisfactory in applications to refer 


(34) 


(X - St)(N - 3) 1/2 
s 


to a normal probability scale when N > 30. 

For large values of N, (34) represents so small a refinement over 
(22) of Chapter VI that the additional computation seems unwar¬ 
ranted. So when N considerably exceeds 30 the older procedure of 
replacing a by s and treating t = (x — x)(N) 1,2 /s as though it were 
normally distributed with unit standard deviation is not appreciably 
erroneous. 



140 


Mathematics of Statistics 


10. Difference Between Two Means. Fisher 7 demonstrated that 
(32) has a much wider range of application than the problem for 
which it was designed. He showed that the ^-distribution is appli¬ 
cable whenever we are dealing with a normally distributed variate 
whose standard deviation is not known exactly but is independently 
estimated from observations amounting to n degrees of freedom. 
The scheme by which the “ Student ” idea is made available to other 
problems consists in constructing a variable t in the nature of a frac¬ 
tion whose numerator is any statistic normally distributed and whose 
denominator is the square root of an independently distributed and 
unbiased estimate of the variance of the numerator involving n 
degrees of freedom. Thus the ^-distribution has been found useful 
in such problems as testing the significance of the difference be¬ 
tween two means and testing hypotheses regarding regression co¬ 
efficients. 

Let Xi, x 2 be the means and Si, s 2 the standard deviations of two 
independent samples of Ni and N 2 variates, respectively, from a nor¬ 
mal universe with mean x and variance <r 2 . According to (10) of 
Chapter VI the variance of the difference between the two means is 
a 2 (Ni + N 2 )/NiN 2 . Then it can be proved 14 that the variable 



is normally distributed with unit standard deviation. However, in 
most practical problems a is unavailable and must be estimated from 
the samples. Using the unbiased estimate defined in (5), the above 
formula becomes 


(36) 


Xy-X 2 \ N,N 2 l 1 / 2 


Fisher showed that (36) is distributed in accord with (32) for 
n = Ni + N 2 — 2, and we can find from Fisher’s table of P the prob¬ 
ability of a greater difference between the means than that observed. 

As Ni and N 2 become large, (Ni + N 2 )/(Ni + N 2 — 2) tends 
toward unity and (36) tends toward the value 



141 


Small or Exact Sampling Theory 

Since (36) is asymptotically normally distributed, the older procedure 
of referring (37) to a normal probability scale in testing a null hy¬ 
pothesis that two samples are from the same universe would not be 
invalid to any appreciable extent for large Values of Ni and N*. 
The present writer 15 has recently called attention to an erroneous 
formula which is commonly used in place of (37). 

If one of the samples, say iV 2 , is so much larger than the other that 
it tends toward the universe, then x 2 tends toward x and s 2 tends 
toward a. So, under these conditions, (37) tends toward 

t = (i x ! - x)VNi 

a 

which, if the subscripts are dropped, is the formula used in testing a 
null hypothesis that a given sample comes from a normal universe 
with a proposed mean. When Ni = iV 2 = N, (36) reduces to 
.JiV-1 1 1/2 

( 38 ) ' 

Inasmuch as we do not ordinarily know whether a sample is drawn 
from a normal universe or some other type of universe, a question 
quite naturally arises as to whether the procedure inaugurated by 
“ Student ” and extended by Fisher is applicable to small samples 
from non-normal universes. The question may be considered par¬ 
tially answered by Bartlett 16 and others who have shown that it 
gives a good approximation for considerable departures from nor¬ 
mality in the sampled universe. However, a word of caution seems 
to be in order lest the new procedure be oversold in the applications 
by completely neglecting the underlying assumptions of normality 
in the universe and randomness of the samples. 

The following examples, cited by Rietz, 17 illustrate the “ Student ” 
theory. 

Example 1. Suppose a random sample of N = 5 is obtained from a hypo¬ 
thetical normal universe whose mean is * = 2. It is found that 2 = 3 and 

= $ for the sample. What is the probability that one would obtain a sample 
of five for which 2 would differ numerically from x by as much as unity? 

Solution. From (12), t = VZ = 2.236. Entering Fisher’s table for n = 4, 
we find the probability P between .1 and .05. Reference to the more extensive 
table in Metron 12 gives P = .0892 for the probability of a discrepancy as large 
as the one observed. It is interesting to compare this result with what would 
be obtained by reference to a normal probability scale. We find P = .0254 for 
a deviation outside t = ±2.236. In terms of the odds that a mean, 2, will 




142 


Mathematics of Statistics 


deviate numerically more than 1 from theory, the contrast is more striking. 
Thus, under the “ Student ” theory we should say that the odds are 10,000 to 
892 or roughly 11 to 1 against a deviation as large as or larger than the one 
observed. Under the normal theory the odds are 10,000 to 254, or about 40 to 
1 against such a deviation. 

Example 2. The following data represents the yields in bushels of Indian 
com on ten subdivisions of equal areas of two agricultural plots in which Plot 1 
was a control plot treated the same as Plot 2, except for the amount of phos¬ 
phorus applied as a fertilizer. 


Plot 1 

Plot 2 

6.2 

5.6 

' 5.7 

5.9 

6.5 

5.6 

6.0 

5.7 

r 6.3 

, 5.8 

5.8 

5.7 

, 5.7 

6.0 

6.0 

r 5.5 

6.0 

L 5.7 

5.8 

5.5 

10 |60.0 

10 |57.0 

Xi =6.0 

x 2 = 5.7 


Is there a significant difference between the yields on the two plots, using the 
difference between their means as a criterion of judgment? 

' Solution. .64 

Si 4 = — = .064 


Substitution in (38) gives 



) 


t 


f 9 l 1 ' 4 
(.3)(10.113) = 3.034. 


Entering “ Student’s ” tables in Metron ( loc. cit.) at n = 18, we find P = .0072 
for the probability that t will fall outside the range —3.034 and +3.034. Hence 
a null hypothesis that the samples are from the same universe would be refuted 
by the test for both the .05 and .01 levels of significance. In other words, our 
conclusion is that, on the levels of significance adopted, there is a significant 
difference between the yields on the plots. 


11. Fisher’s ^-Distribution. Suppose u 2 and v 2 are two inde¬ 
pendent and unbiased estimates of the variance a 2 of a variable x 
which is normally distributed. If these estimates are based upon 
samples of Ni and N 2 , respectively, or upon n x and n 2 degrees of 



Small or Exact Sampling Theory 


143 


freedom, then we have 

1 Si 1»i+l 

u 2 = —-- XiY = — Z («w ” 2 i) 2 

iVi — 1 i »i i 

i St 1 w+1 

«* = - —- X>« - x,y = — E (** - 

Ni-li n 2 i 

in which #i and X 2 are the means of the two samples. In previous 
notation u 2 and v 2 would be denoted by ci 2 and a 2 2 but these symbols 
are too unwieldy in the present discussion. 

In constructing a test of significance for the difference between two 
sample variances it might seem logical to form the difference 
w = u 2 — v 2 and seek the distribution function of w. However, 
such a procedure is impractical because of the mathematical difficulty 
involved in dete rminin g this function. Fisher circumvented this 
difficulty by building a statistic, z, defined by 

(39) z = Klog« w 2 ~ log* v 2 ) = lo S«~ 


whose distribution function, G(z), he obtained and which proved to 
have extremely wide application. To derive G(z) we make use of the 
dis tributi on of H(s 2 ) given in (25), replacing AT - 1 by n and s 2 by 
(n/n + 1) u 2 (see § 3). After this modification, (25) becomes 
j_n_W 2 

(40) V -l—- (U 2 ) (n-2)/2g-n^ ^W 2 ). 

fn\ 

r U 


Since u 2 and v 2 are independent their joint distribution is 
(41) K (u 2 ) (»l-2)/2( W 2) (n 2 - 2 )/ 2 e -(»l« 2 +n 2 «*) ^{u 2 ) d{v 2 ) 


where 


K = - 


(m)»*' 2 (n»)«»'» 


2<ni+nj—2)/2 <r (ni+»2)p 


(?Mf) 


From (39) we have 

(42) u 2 = v 2 e 2z 


and for a fixed value of v 2 , 

(43) d{u 2 ) = 2v 2 e 2 * dz. 




144 


Mathematics of Statistics 


Using (42) and (43) in (41) we obtain 
(44) 2i£e n i*e~ (ni ^ z+n2) ”* /2<r2 (z; 2 ) 


for the joint distribution of v 2 and 0 . Integrating (44) with respect 
to v 2 between the limits 0 and °o and making use of the Gamma 
function we obtain the distribution of z, 


(45) 


G(z) = 


2ni” l/2 n 2 W2/2 e niz dz 
D /ni n 2 \ {n x e 2z + n 2 ) ( ” 1+ " 2)/2 

'U’* ) 


The function G{z) has the important property that it depends 
solely upon wi and n 2 , not at all upon the variance of the sampled 
universe. Fisher’s z should not be confused with the z-distribution of 
“ Student.” 

The distribution function of z is extremely general, including as 
special cases, the x 2 -distribution, the ^-distribution of “ Student ” and 
Fisher, and the normal distribution. Rider 18 has made easily avail¬ 
able the transformations and substitutions by which these special 
cases can be obtained from (45). 

The positive part of the curve for z = log e (u/v) is the same as the 
negative part for z = log* (v/u). Since it is optional which estimate is 
considered as u 2 it is necessary, in tabulating the probability integral 
of G(z), to consider only positive values of z by makin g u 2 the larger 
variance estimate (based on n i degrees of freedom). 

s*zo 

Let Q =J G{z) dz and let P = 1 — Q. Thus P is the probability 

that z > z 0 . In his book, Fisher 19 has given values of z 0 corresponding 
to the probabilities P = .05 and .01 for various combinations of 
ni and n 2 . These values, z 0 , are called the “ 5% and 1% points ” 
and are used as critical values in judging significance. It should be 
noticed that Fisher’s “ points ” are based on the area of the whole 
curve and therefore they should not be confused with 5% and 1% 
“ levels of significance ” previously used. In the latter sense, 
Fisher’s “ points ” would be 10% and 2% “ levels of significance.” 
In other words, a 5% point means a value of z such that one “ tail ” 
under the curve is .05, whereas a 5% level of significance meant a 
value of t such that the sum of both “ tails ” (outside =Lt) is .05. 
It is hoped that tables of 5% and 1% levels of significance for z will 
sometime be available. 






145 


Small or Exact Sampling Theory 

12. Significance of Difference Between Variances. The usual 
hypothesis tested by the 2 -test is that u 2 and v 2 are estimates of one 
and the same population variance and therefore that 2 - 0. The 
significance of the divergence of the observed value of 2 from zero 
is the crux of the test. Small values of z mean a tenable hypothesis 
whereas values of 2 larger than 2 0 refute the hypothesis. If for 
p = .05 (or .01) the observed value of 2 , as computed from the 
samples in accordance with (39), is larger than 2 0 , the hypothesis 
is to be rejected and the conclusion is that the samples come from 
universes with different variances. 

Logically, the 2 -test should be applied before testing the difference 
between two means since the latter test depends on the equality of 
the population variances. 

To avoid the troublesome logarithmic computation involved in 
(39) Snedecor 20 has published tables which transform Fisher’s 5% 
and 1% points into the ratio u 2 /v 2 , where e 2 ‘ = u 2 /v 2 . Snedecor 
calls this ratio F in honor of Fisher.* Therefore, 



where u 2 is to he chosen the larger of the two given variance estimates. 
This table is reproduced in the Appendix. (See Table II.) 


Example 3. In Example . 2 suppose we wish to test the assumption, which 
was made there, that the two samples come from universes with equal variance. 
We have 


ti 2 


»1 + 1 , 

-Si 2 

ni 


M 

9 


.0711 


v 2 


+ 1 - 

-S 2 2 

n 2 


— = .0267 
9 


F 


.0711 

.0267 


= 2.663 


z = .5 log e F 
= 1.1513 logioF = .49. 

Entering Fisher’s table ( loc. cit.) for »i = n 2 = 9 we find Zo = .58 for P = .05 
and zo = .84 for P = .01. This means that, if the true value of z were zero, 
random sampling fluctuations would be expected to give a value of z as great as 
.84, or greater, once in 100 trials, and a value of z as great as .58, or greater, five 

* In their new Statistical Tables Fisher and Yates call it the variance ratio. 
These tables are published by Oliver and Boyd, London. 



146 


Mathematics of Statistics 

times in 100 trials. The observed value of z is .49 and so this value might be 
accounted for by chance, at either the .05 or .01 points of significance. Using 
Snedecor’s table we find F = 2.97 for P = .05 and F = 4.85 for P = .01. Since 
the observed value of F is only 2.663, we conclude that we were justified in 
proceeding with the i-test. 

When the samples are large there are two procedures available. 
I. G(z) is skew when n x ^ n 2 but when n x = n 2 it is symmetrical. 
When wi and rc 2 are large and also for moderate values when they 
are equal or nearly equal one can verify (by taking logarithms) that 
0 is approximately normally distributed about zero with mean zero 
and variance £(l/wi + l/n 2 ). Therefore, 



may be referred to a normal probability scale. 

II. Let w = si 2 - s 2 2 . From (10) of Chapter VI and (29) of 
this chapter, 



is normally distributed about zero with unit standard deviation. 
An estimate of the supposed common variance a 2 is given in (5). 
Using the square root of this estimate in place of a in (47) and assum¬ 
ing that Ni and N 2 are large enough to regard, without appreciable 
error, the ratio ( Ni — N 2 )/(Ni + N 2 — 2) as unity we obtain 

(47a) t = — - Sl -~ S * • 

12A^ 2 ' 2Ni J 

This value may then be referred to a normal probability scale. 

An interesting derivation, using characteristic functions, of a 
method for testing the significance of the difference, between two 
sample variances has recently been given by A. T. Craig. 21 





147 


Small or Exact Sampling Theory 

13. Analysis of Variance. The test of significance between two 
independent sample variances (with their appropriate degrees of 
freedom) is a special case of a general technique, developed by Fisher, 
for segregating the variance into portions traceable to specific sources. 
In general, the kind of procedure one attempts to follow in such an 
analysis can be illustrated by the following scheme. 

Let us imagine a individuals 7i, 7 2 , • * • > 7 a , each subjected to 
b treatments T h T 2 , • • • , T h . For example, the 7’s may be agri¬ 
cultural plots containing different varieties of some plant and the 
T’s may be applications of various kinds or amounts of fertilizers. 
Or the Fa might conceivably be various diabetic patients and the 
T’s varietal insulin treatments. The effects of the T’s on the Fa 
yield a set of observations, to be denoted by which vary from 
one value of 7 to another for a fixed T and from one value of T to 
another for a fixed 7. Suppose, then, that N = ab independently 
observed values of a normally distributed variable are classified into 
a rows and b columns in accordance with some relevant scheme as 
depicted in Table 13. 


Table 13. Matrix of N = ab Independent Values 
from a Normal Universe 



Ti 

T % 

T b 

II 

Xn, 

X12, 

’ * Xib 

h 

Xn, 

X22, 

• • X2b 

la 

X ah 

Xa2y 

• • Xab 


The values in each row will vary about the mean of that row and 
the values in each column will vary about the mean of that column. 
Let Xj. denote the mean of the jth row, 

b 

(48) bx h = £>#*, j = 1, 2, • • • , a, 

k*=i 

and let x.k denote the mean of the fcth column, 

(49) ax.k = 2®/fc = 1, 2, • • •, 6. 

j=i 

(The dot indicates that summation has been effected on the index 






148 


Mathematics of Statistics 


which it replaces.) Let the mean of the entire set be x where 

(50) abx = E 

1 i 

Let the variance in the entire set due to all causes be Q/ab where 

a b ab 

(si) q = - s) 2 = 2>< - *) 2 . 

l l l 

Now Q can be resolved into three quadratic forms as follows: 

(52) Q = q x -f- q 2 -j- q 3 

where 

qi = bj^ixj. - xY 
i 

b 

qi = a^2(x. k - xY 

a b 

93 = — - X. k + xY. 

1 1 

That (52) is an identity in the N = ab values of x can be readily seen 
as follows: 

a b a b 

ZEfofc ~ X) 2 = EZlfe - Xj. - X. k + X) + (Xj. - x)+ (x. k -x)Y 

a b a b 

= - */. - x. k + xy + £Z(^. - xY 

1_1 1 1 

+ ZE(*.* - x)\ 

i i 

To show that the cross-product terms vanish consider the term 

a b 

— Xj. — x. k 4- x) (xj. — x). 

This becomes 

a b 

J2(Xj. ~ X)j^(x ik - Xj. - X. k + X) 

3 = 1 k=l 

= £(*/. — x)Q)Xj. — bxj. — bx + bx) = 0. 

3 = 1 

A similar demonstration can be made for the other cross-product 




Small or Exact Sampling Theory 


149 


terms. 


This is left as an exercise for the student. 

ZZfe - *) ! = b±(3,. - d» 

11 1 


“ £) 2 = a^2(x. k - x) 2 

l l i 


Since 


(52) is established. 

The variability between rows is measured by qi and between columns 
by q 2 . The residual variability, freed from the influence of either 
rows or columns, is measured by q 3 and is called interaction (sometimes 
also discrepance). It may be regarded as the “ experimental error ” 
inherent in the experiment and over which no control is attempted. 
As will be shown later, it is used as a standard against which the 
variability measured by either qi or q 2 may be tested for significance, 
when the appropriate number of degrees of freedom are taken into 
account. 

From (51) the number of degrees of freedom in Q is seen to be 
N — 1. Since there are a values of x,-. the number of degrees of 
freedom in qi is (a — 1). Similarly, the number in q 2 is (6—1). 
This leaves (N — 1) — {(a — 1) + (6 — 1)} = (a — 1)(6 — 1) for 
qz, a result which may also be deduced from the expression for q z . 
Another form of argument is as follows. The ab means of rows and 
columns form an (a X 6) -fold table of (a — 1 )(6 — 1) degrees of 
freedom since the marginal means are fixed in terms of the x ]k values. 
Anyhow, the number of degrees of freedom in interaction is the prod¬ 
uct of the numbers in the interacting forces. Accordingly, an un¬ 
biased estimate of a 2 from the rows is qi/(a — 1 ), from the columns 
is 52/(6 — 1), and from interaction is q 3 /(a — 1)(6 — 1). It is 
clear, therefore, that the 2 -distribution can be employed to test the 
significance of the variability attributable to these sources if the 
independence of the above-mentioned estimates is assured. A. T. 
Craig 21 has settled this point by establishing the independence of 
the q’ s. 

The quantities required in an analysis of variance are summarized 
in Table 14. They can be readily computed except possibly q s . So 
long as the arithmetic involved in computing the other quantities 
is carefully checked it is sufficient to evaluate q 3 from relation (52). 
In other words, the sum of squares due to interaction may be found 
by subtracting (51 -f- q 2 ) from the total sum of squares. 


150 


Mathematics of Statistics 


Table 14 


Variance due to 

D. of F. 

Sum of Squares 

U nbiased 
Estimates 

Rows 

a — 1 

qi = - 2) 2 

qi/(a - 1) 

Columns 

b - 1 

b 

qi = ~ xY 

q*/(b - 1) 

Interaction 

(a - 1)(6 - 1) 

qs = Q — qi - ?2 

q 3 /(a - 1)(6 - 1) 

Total 

ab — 1 

Q =EE(*,-*- 2) 2 
1 1 



Under the null hypothesis that there is no significant variation 
from row to row, the quantity 


(53) 


2 = 1 log* 


(b ~ Dgx 

?3 


will be distributed in accord with (45) and the hypothesis can be 
tested from critical values of z or, more conveniently, perhaps, from 
Snedecor’s table by computing 


(64) 


p _ (b ~ l)gi 

?3 


and entering the table at (wi, n 2 ) where Wi = b — 1, and n 2 = 
(a — 1) (6 — 1). If the computed value falls above the critical value 
adopted, the null hypothesis is rejected for that value. Similarly, 
to test the null hypothesis that there are no significant effects from 
column to column we compute 


( 66 ) 


P _ (g ~ 

Qs 


and compare it with one of the tabular entries for Wi = a — 1, n 2 = 
(a-l)(6-l). 

Example 4. On a feeding experiment a farmer has four types of hogs denoted 
by I, II, III, IV. These types are each divided into three groups which are fed 
varietal rations A, B, and C. The following results are obtained, the numbers 
in the table being the gains in weight in pounds in the various groups. 





Small or Exact Sampling Theory 


151 



I 

II 

III 

IV 

Totals 

A 

7.0 

16.0 

10.5 

13.5 

47.0 

B 

14.0 

15.5 

15.0 

21.0 

65.5 

C 

8.5 

16.5 

9.5 

13.5 

48.0 

Totals 

29.5 

48.0 

35.0 

48.0 

160.5 


The computations yield the following results: 


Sum of Squares 

D. of F. 

Unbiased Estimates 

Rations 

54.1250 

2 

27.06 

Types 

87.7292 

3 

29.24 

Interaction 

28.2083 

6 

4.70 


To test the significance of the variation in rations we refer F = 27.06/4.70 = 5.76 
to Snedecor’s table where, corresponding to (2, 6) degrees of freedom, we find 
5.79 for the 5% point and 10.92 for the 1% point. Similarly, to test the sig¬ 
nificance of the variation between types we compute F — 29.24/4.70 = 6.2. The 
entries in the table for (3, 6) degrees of freedom are 4.76 for the 5% point 
and 9.78 for the 1% point. Our conclusion is that there is a significant differ¬ 
ence between breeds (somewhat doubtful) and between varieties of rations at 
the 5% point, but that neither is significant at the 1% point. 

14. Testing Variation in Sub-sets of Means. In a previous chap¬ 
ter a method was given for testing the significance of a difference 
between two means. We shall now show that the analysis of vari¬ 
ance technique lends itself to testing the significance of differences 
between any number of group means. 

Consider normal universes with means y x , (x — 1, 2, • • •, 6), and 
variance a 2 . Let samples of N x be drawn one from each of these 
universes and let y x and s x 2 be the mean and variance of the sample 
of N x . Thus we have b classes or arrays (as in a correlation table). 
The notation for the samples is summarized in Table 15. 





152 


Mathematics of Statistics 


Table 15 


Classes 

1 

2 

X 

■ ■ b 

Means 

Vu 

y*. 

• • Vx, 

• • Vb 

Standard 

Deviations 

Si, 

S2, 

Sx, 

Sb 

Frequencies 

N h 

N 2 , ■ 

• • N x , ■ 

•• N b 


Our problem is to test, from the samples, the hypothesis that y x = y 2 
= ••• = &• 

It can be shown (Cf. Part I, Ex. 3, p. 208) that the sum of the 
6 

squares of deviations of the N = ^N x variates y x from the mean 
i 

y of the entire set may be broken up into two parts such that 
V = vi + v 2 


where 


V = T.(y, - yV 
1 

Vi = J2Nx(y x - y) 2 

i 

Vi = J ZNxSx 2 . 

1 


It is conventional to call v\ the variation between classes and v% the 
variation within classes. 

b 

An unbiased estimate of y x is y where Ny — ^N x y x . Hence there 

i 

are b — 1 degrees of freedom in v x . An unbiased estimate of <r 2 from 
the values of y x is V\/{b — 1), and from the values of s x 2 is v 2 /(N — b) 
since the variates in the computation of s x 2 are subject to the linear 

restriction 2jy x — N x y x and there are b values of x. Therefore, 
i 

under the null hypothesis that y x = y% • • • — yi, the quantity 




(N - b)v x 
g ' (b - 1>2 


( 56 ) 




Small or Exact Sampling Theory 153 


is distributed in accord with (45) and the hypothesis can be tested 
by computing 


(57) 


(N - b)v x 
(b - l)v 2 


and comparing it with the entries in Snedecor’s table for (n X} n 2 ) 
where n x = b — 1, n 2 = N — b. The quantities required in the 
computations are summarized in Table 16. 


Table 16 


Variance due to 

D. of F. 

Sum of Squares 

Unbiased Estimates 

Between Classes 

6-1 

Vi 

v/(b - 1) 

Within Classes 

N -b 

V2 

v/(N - 6) 

Total 

N - 1 

V 

(N - b)vi 
(b - l)v 2 


The variation within classes is independent of the principle of classifi¬ 
cation. Therefore, excessive variation between classes (variation of 
the yj s) as compared with variation within classes (variation of 
sample values about their respective means) will cause F to fall 
above the critical value adopted, and the null hypothesis is contra¬ 
dicted or refuted for that value. 

Examples from agricultural and certain branches of biological 
science will be found in the textbooks by Fisher and by Snedecor, 
and from the field of economics in Mills’ text (revised edition). 

15. Testing Linear Regression. Consider a correlation table with 
b arrays in the x direction. Let f(x) represent the frequency and y x 
the mean in the array at x. Let (x, y) be the mean of the table and 
m i and ra 2 the linear regression coefficients as defined in Part I. 
Suppose the N = y) entries in the table constitute a sample 

x y 

from a normal bivariate universe and we wish to test the hypothesis, 
H, that the regression of y on x is linear. It is shown in Part I that 
Y x — y = mi(x — x) is the equation of the line which fits the means 
of the arrays best, in a least-squares sense, and so Y x is the estimated 
mean of the array at x. (A slightly different notation was used in 
Part I.) 



154 


Mathematics of Statistics 


The variation B between arrays can be resolved into two com¬ 
ponents Bi and B 2 such that 

(58) B = B, + B 2 

b 

where B = J2f(x)(y x - y) 2 

1 

Bi = - Y,y 

1 

Bi = £/(*)( F, “ »*. 

1 

To establish (58) we may write B in the form 

Z/0)f<3, - Y ,) + (K, — j)| 2 

which upon expansion equals Bi + B 2 because, as the student may 
verify, the cross-product term vanishes. 

It is shown in Part I that B = Nrj yx 2 <r v 2 (Cf. (39), p. 200) and 
B 2 = Nr 2 a„ 2 (Cf. (16), p. 172). Since B 2 is the part of B which is 
accounted for by H it follows from (58) that Bi — N<r y 2 {y yx 2 — r 2 ) 
is the part of B not accounted for by H. We are interested in the 
question, Is Bi excessive compared with the random sampling fluctua¬ 
tions to be expected under the null hypothesis? To answer this 
question consider the variation W within arrays where 

w = Zf(x)(y - y,y. 

In Part I this was designated by NS' y 2 which in turn is equal 
to A r <r 1/ 2 (1 — r} yx 2 ). This variation within classes is due to a host 
of random forces which are not dependent on the value of x de¬ 
fining the arrays. Therefore, W provides a basis for testing 
whether Bi is small enough to be accepted as the resultant of random 
forces under H or whether it is so large as to contradict H. Before 
we can use the 2-test, however, the degrees of freedom must be 
reckoned. In B there are 6 — 1 degrees of freedom because the 6 

b 

values of y x are subject to the linear restriction ^2y4( x ) = Ny. 

x = l 

The number in B 2 may be determined by making use of the regression 
equation and writing jB 2 in the form 

Z/(x)(r, - y) = »»i ! £/(*)(* - s) 2 - 



Small or Exact Sampling Theory 


155 


b . . 

Since £/(a0 (x - x) % is independent of the regression, the variation 
1=1 

in B% must be due to the single statistic mi and therefore mvolve one 
degree of freedom. Hence, from (58), there are 6 — 2 degrees of 
freedom in Bi. Since there are 6 arrays there are N — 6 degrees 
of freedom in W. Consequently, 


z = \ loge 


1 


— r 2 N — 6 

— ij 2 6 — 2 


is distributed in accord with (45) if H is true. The computed value of 


TV 2 - r 2 N - b 
1 - il 2 6-2 


may, therefore, be compared with one of the entries in Snedecor’s 
table for ni = 6 — 2, n% = N — 6 . 

This is the test which was promised in Part I to replace the Blake- 
man criterion which Fisher proved was unsound. The student may 
construct a similar argument for testing an hypothesis of linear 
regression of x on y. 

16. Tests of Significance of r. Let the variables x, y be simultane¬ 
ously distributed in accord with some one or other of the distribution 
functions 

f(x, y) = Ke~ p , —» < x < », — qo < y <. 

where 

i = 2 W „(1 - p*)'« 

A 

_ 1 f (x - x) 2 2 p(x - x)(y - y) [ (y - y ) 2 \ j 

P ~ 2(1 - p 2 ) 1 <Tx 2 J 

and 5 , y, <r*, cr v , and p are undetermined. In other words, suppose that 
the universe is some normal bivariate distribution. The question of 
the reliability of a value of r computed from a sample of N pairs of 
( x , ?/) from such a universe may conveniently be discussed under two 
cases. 

Case I. When p = 0. In testing the significance of an observed 
value of r we are testing the hypothesis that p = 0. Under this 
hypothesis the sampling distribution of r is known to be 

f(r) = fc(l - r 2 )(*- 4 >' 2 , -1 < r < 1 , 



156 


Mathematics of Statistics 


where \/k = B(%, N — 1/2). The curves represented by this func¬ 
tion are symmetrical about r — 0 with 


o r — (N — l)- 1 ' 2 



As N becomes large the function is practically normal and conse¬ 
quently 

(69) t = r(N- 1) 1/2 

tends to be normally distributed with mean zero and unit standard 
deviation. Therefore, to test the significance of a value of r com¬ 
puted from a large sample it would not be invalid, to any appreciable 
extent, to refer (59) to a normal probability scale. 

When N is small the problem may be resolved into an analysis of 
variance. In a correlation table, the total variation in the y direction 
may be broken up inlo two parts, (1) the part Nr 2 o y 2 which may be 
accounted for by an hypothesis of linear regression and (2) the residual 
part NS V 2 = No y \ 1 — r 2 ). If there is no real correlation between 
the two variables then parts (1) and (2) are estimates of the same 
universe variance. Now to apply the 2 -test we must have unbiased 
estimates. There is one degree of freedom in part (1) and N — 2 in 


Table 17 



Variation 

D. of F. 

Regression line 

H(y - Y x y/(x) = NrW 

1 

Residuals 

E(T* - mix) = m - r W 

N - 2 

Totals 

E(y - y) 2 f(x) = Nay* 

N - 1 


x 



part (2). Consequently we may test the independence of y and x by 
computing 


and seeing if it lies beyond the 5% or 1% points in the table for 
wi = 1, n 2 = N — 2. However, it is conventional to make use of 



157 


Small or Exact Sampling Theory 


Fisher’s ^-distribution. It can be shown (see Problem 10) that the 
distribution of t is a special case of that for 2 when ni = 1, ni = n, 
and z = i log* t 2 . Therefore, 


( 61 ) 


IrfX 


is distributed in accord with F n (t) for n = N — 2. In § 11 we ob¬ 
served that the 0.05 level of significance for z is the .025 point. How¬ 
ever, when used as an alternative to t, the 0.05 point of z is also the 
0.05 level because the whole distribution of t is equivalent to the 
positive half of the 2 -distribution in the sense that, for tests of signifi¬ 
cance, 2 ranges from 0 to <*> whereas t ranges from — oo to «. 

Tables are available (Fisher’s text, Table V.A.) for applying this 
test directly from r, giving values of r on four levels of significance 
represented by P = .10, .05, .02, and .01, for various values of n. 
It might prove interesting to compare an entry in this table with the 
corresponding entries in the z and t tables. For example, when 
n = 18 (N = 20) we find from this table that r = .4438 lies on the 
P = .05 level, and making the transformation to z by (60) we obtain 
2 = .7424 which agrees exactly with the entry in the 2 -table at the 
.05 point when ni = 1, n* = 18. Finally, when r = .4438 in (61) 
t — 2.101 which is the entry in the f-table at the .05 level. 

Case II. When p j* 0. If the samples are large (N > 100) and 
if p is small or only moderately large (|p| < .6 perhaps) then it is 
true that r is approximately normally distributed about the value p 
with standard deviation of 

<r r = (1 — p 2 ) (N — 1) -1/2 . 

It is customary, under these conditions, to attach to an observed 
value of r a standard error of 

<r r = (1 - r 2 )(N- 1)- 1/2 

and, for a proposed p, to refer the computed value of 


to a normal probability scale. 

This procedure is invalid, however, if N is small and p is large. 
The distribution of r from small samples is skew and the skewness 
increases with p. This may be understood intuitively by considering 
the distribution of r’s from a universe in which p is .9. The range of 



158 


Mathematics of Statistics 

possible variation of r above p is only .1. But the possible range 
below p is 1.9. Accordingly the sampling distribution of r (N small) 
from this universe will be sharply skew. An extensive cooperative 
study of the distribution of r was made in 1917 by Soper and others. 23 




They succeeded in finding expressions for its moments and on this 
basis represented the distribution, for various values of N and p, by 
Pearson curves. They also gave an elaborate set of tables of ordi¬ 
nates for values of p from 0 to 1 by increments of .1 and for values 
of r from -1 to +1 by increments of .05. The upper panel of 
Figure 22 (from Fisher’s book) shows the r curves for two values 



159 


Small or Exact Sampling Theory 

of P with N = 8, which (presumably) were drawn from the ordinates 
of these tables. They indicate, the rapid departure from normality 
that may be expected for small samples as p approaches high values. 

In his study of the sampling distribution of the correlation coeffi¬ 
cient Fisher found that it was not desirable to use r as the independent 
variable and he introduced a transformation which has distinctive 
merits. He showed that the quantity* 

1 -f ■ T 

(62) z' = %loge —- 

is approximately normally distributed and is nearly constant in form 
as p changes. Its mode is always close to p. The lower panel of 
Figure 22 shows the distribution curves for z' corresponding to the r 
curves in the upper panel. The standard deviation is 

(63) ov = (N - 3)-i' 2 

and is practically independent of p. The transformation is applicable 
in the following tests (among others). 

(a) To test if an observed value of r differs significantly from a 
proposed theoretical value. 

(b) To test if two observed values are significantly different. 

The procedure for (a) is to calculate 

t = \ z > - p )(N- 3) 1 ' 2 

and refer the result to a normal probability scale. For (6) the pro¬ 
cedure is to find, in accordance with (62), the two values of z ’, say 
z'i and z' 2 , corresponding to the two observed values of r, say n and r 2 
from samples of Ni and iV 2 , respectively. Then compute d = z\ — z\ 
and <r d = {1/(AT 1 - 3) + l/(# a - 3)} 1 ' 2 and refer 



<u 


to a normal probability scale. 

For numerical examples the student is referred to Fisher’s book, 
§§ 33-35. Tables are also available there to facilitate the com¬ 
putation of z' for an assigned r. One should observe that the z' 
technique is not applicable to the case of simple tests of significance 
(p = 0). In that case Fisher’s table of t is available. 

* This quantity is not quite the same as the z used for the ratio of two vari¬ 
ances and so we use a prime here to distinguish between them. 



160 


Mathematics of Statistics 


Three final remarks seem appropriate. (1) In computing an r 
to be tested it is not desirable to apply Sheppard’s corrections to 
s x and s y because they tend to increase the value of r. This also 
applies in testing for linear regression (§15). (2) It has been shown 

that the z’ procedure is applicable in testing the significance of partial 
correlation coefficients if N in oy, is replaced by N — k where k is the 
number of secondary subscripts in the coefficient. (3) All of the 
above procedures are strictly valid only for normal universes. How¬ 
ever, there is considerable experimental evidence to indicate that 
they hold for all practical purposes provided the marginal distribu¬ 
tions of one or both variables in the universe are not of the J- or U- 
shaped types. Of course, in those extreme cases one would naturally 
hesitate to use r as a measure of association. 

References and Notes 

1. R. A. Fisher and some other writers use the symbol s 2 to denote an unbiased 

estimate of a 2 based on a single sample. 

2. A good discussion of the problem of Statistical Estimation is given by 

J. Neyman, Lectures and Conferences on Mathematical Statistics, pp. 127- 
142, 1938. The Graduate School of the United States Department of 
Agriculture, Washington, D. C. 

See also 

E. S. Keeping, Note on a Point in the Theory of Sampling. American 
Mathematical Monthly, vol. 42, p. 161. 

E. G. Olds, A Note on the Problem of Estimation. American Mathe¬ 
matical Monthly, vol. 44, p. 92. 

3. Egon Pearson, The Application of Statistical Methods to Industrial Standard¬ 

ization and Quality Control. British Standards Institution. 1935. 

4. W. A. Shewhart, Economic Control of Quality of Manufactured Product. 

D. Van Nostrand. 1931. 

5. A. T. Craig, A Certain Mean-Value Problem in Statistics. Bulletin Ameri¬ 

can Mathematical Society, vol. 42, pp. 670-674. 

6. “ Student,” Probable Error of Mean. Biometrika, vol. 6, 1908. His real 

na.mp was W. S. Gosset. Interesting biographical sketches of Mr. Gosset 
will be found in the Journal of the American Statistical Association, vol. 
33, No. 201, March, 1938, pp. 226-228; and the Journal of the Royal 
Statistical Society, vol. Cl, Part I, pp. 248-251. 

For an account of the earlier contributions by Helmert (1876) and 
Czuber (1891) see Some Topics in Sampling Theory, H. L. Rietz. Bulletin 
American Mathematical Society, April, 1937. 

7. R. A. Fisher, Applications of “ Student’s ” Distribution. Metron, vol. 5, 

no. 3, pp. 90-93. 

8. R. C. Geary, The Distribution of “ Student’s ” Ratio for Non-Normal Sam¬ 

ples. Journal of the Royal Statistical Society Supplement, vol. 3, 1936, 
pp. 178-84. 



161 


Small or Exact Sampling Theory 

9. V. Romanovsky, On the Moments of Standard Deviation and of Corre¬ 
lation Coefficient in Samples from Normal. Metron, vol. 5, no. 4, pp. 
3-46. 

10. Paul R. Rider, A Survey of the Theory of Small Samples. Annals of Mathe¬ 

matics, (2), vol. 31, no. 4, pp. 577-628. 

11. Deming and Birge, On the Statistical Theory of Errors. Review of Modern 

Physics, vol. 6, pp. 119-161. (Reprints available at the Graduate 
School of the United States Department of Agriculture, Washington, 
D. C.) 

12. Vol. 5, no. 3, pp. 114-120. 

13. R. A. Fisher, Statistical Methods for Research Workers. Oliver and Boyd. 

Table IV. 

14. Dunham Jackson, The Theory of Small Samples. American Mathematical 

Monthly, vol. 42, 1935, pp. 344^364. 

15. John F. Kenney, A Note on Certain Formulas used in Sampling Theory. 

Am erican Mathematical Monthly, vol. 45, pp. 456-458. 

16. M. S. Bartlett, The Effect of Non-Normality on the t-Distribution. Proc. 

Camb. Phil. Soc., vol. 31 (1935), pp. 223-231. 

17. H. L. Rietz, Comments on Applications of Recently Developed Theory of Small 

Samples. Journal American Statistical Association, vol. 26 (1931), pp. 
150-158. 

18. P. R. Rider, A Note on Small Sample Theory. Journal American Statistical 

Society, vol. 26, no. 174, pp. 172-174. 

19. Table VI in reference 13. 

20. G. W. Snedecor, Statistical Methods. Collegiate Press, Inc. Ames, Iowa. 

21. A. T. Craig, On the Difference Between Two Sample Variances. National 

Mathematics Magazine, vol. 11 (1937), pp. 259-262. 

22. A. T. Craig, On the Independence of Certain Estimates of Variance. Annals of 

Mathematical Statistics, vol. 9, pp. 48-55. See also a paper by Irwin 
entitled On the Independence of the Constituent Items in the Analysis of 
Variance. Journal Royal Statistical Society, Supplement 1-3, 1934-36, 
pp. 236-251. 

23. H. E. Soper et. al., On the Distribution of the Correlation Coefficient in Small 

Samples. Biometrika, vol. 11, pp. 328—413. 


Problems 

1. Derive the expression for the expected value of s 2 in repeated samples of N 

independent observations from an arbitrary universe. Explain the use 
of this expression in estimating the variance of a universe. 

2. In a certain observed distribution, V = 20, x = 42, s = 5. Test the hypoth¬ 

esis that this distribution is a random sample from a normal universe 
with mean of 50. 

3. In a certain test, one section of 20 students had an average score of 40 with 

a standard deviation of 5. Another section of 25 had an average of 46 
with standard deviation of 4. Does this indicate a significant difference 
in the two groups? What assumptions do you make in applying the test? 



162 


Mathematics of Statistics 


4. In an experiment in industrial psychology a job was performed by one 
group of 30 workmen according to Method I and by a second group of 40 
according to Method II. (The groups were independent and equally 
efficient.) Are the following distributions of the time (in seconds) taken 
such as to justify the conclusion that Method I is the speedier of the two? 
Use the difference between the means as a criterion of judgment. 


Time 

I 

II 

50 

1 

0 

51 

3 

1 

52 

5 

2 

53 

4 

5 

54 

7 

8 

55 

5 

9 

56 

3 

6 

57 

1 

3 

58 

1 

3 

59 

0 

1 

60 

0 

2 

Totals 

30 

40 


5. From the separate distribution functions of x and s derive the distribution 

of “ Student’s ” z , and from that obtain the function F n (t). 

6. Prove that F n {t) is asymptotically normally distributed. 

7. Derive Fisher’s ^-distribution, G(z). 

8. {Mills’ text, revised.) Manufacturing industries were classified into those 

producing perishable, semi-durable, and durable goods. An average of 
changes occurring between 1929 and 1933 in the selling prices of the prod¬ 
ucts of each of these categories was computed giving the index numbers 
shown in the y x column of the following table. 


Class of industry, 

X 

Number of 
industries, N x 

Means, 

y* 

Compulations 

Producing perish¬ 
able goods 

34 

69.81 

00 

1 

of 

1 

Producing semi-du¬ 
rable goods 

26 

66.41 

Vi = 2,161.8800 

Producing durable 
goods 

25 

78.96 

v 2 = 15,564.9040 

All industries 

85 


V = 17,726.7840 


Compute F and test the null hypothesis that there was no real difference 
in the price movements of the three different classes of industry for the 
years 1929-1933. 




Small or Exact Sampling Theory 

2 mNi 


163 


9. Prove that 

10. Prove that the test for significance between two means is a special case of 

the test for significant variation in sub-sets of means by showmg that ( ) 

of § 14 reduces, when 6 = 2, to 

ft-fo f NJTt V' 2 

a [Nx + Nii 

where a is an unbiased estimate of a and t is distributed in accord with 
F n (t) for n = N\ Nz — 2. 

The following three problems are from Fisher’s hook. 

11. For the twenty years 1885-1904, the mean wheat yield of Eastern England 

was found to be correlated with the autumn rainfall; the correlation was 
found to be -.629. Is this value significant? 

12 In a sample of N = 25 pairs of parent and child the correlation in a certain 
character was found to be .60. Is this value consistent with the view 
that the true correlation in that character was .46? 

18, Of two samples the first, of 20 pairs, gives a correlation of .6, the second of 
25 pairs, gives a correlation .8. Are these values significantly different? 



CHAPTER VIII 

A. THE X 2 DISTRIBUTION AND APPLICATIONS 


1. The Multinomial Law. 1 The general term of the multinomial 
expansion for k mutually exclusive categories sets the stage for a 
presentation of x 2 which provides an insight into the probability 
theory of this important quantity and its usefulness in the testing of 
hypotheses. So we begin with a preliminary treatment of the multi¬ 
nomial law. 

Consider an event that is characterized by a variable v which can 
take on one of k values, Vi, v 2 , ■ ■ • v k . Let the probability that v> 

occurs be p,-, where = 1. Then in N independent trials, the 
1 

probability that Vi occurs mi times, v 2 occurs m 2 times, and so on, 
in a specified order (whatever it may be) is 

Pi mi p 2 mi • • • Pk mk 
k 

where ^m,- = N, the m’s being positive integers or zero. The num- 
i 

ber of ways in which the order can be specified is the number of 
permutations possible among N objects of which mi are of type T h 
m 2 of type T 2 , • • • m k of type T k . Let this number be denoted by 
p[m»]. Then we have 

r , N\ 

p[mi] = -:-:--• 

mi ! m 2 ! • • • m* ! 

Therefore, the probability that mi of the variates take the value Vi, 
m 2 the value v 2 , and so on, regardless of order is 

(1) /(mi, m 2 , • • • m k ) = p[rn i ]p 1 mi p 2 m2 • • • Pk m * 

which is the general term of the expansion of the multinomial 

(pi + p 2 +-1- Pk) N . 

The law of repeated trials, for a simple dichotomy, given in Chap¬ 
ter I, is a special case of this law. Thus if k = 2, the right member 

164 



The x 2 Distribution and Applications 


165 


of (1) reduces to 

(2) C(N, r)p r q N ~ T 

where 

r = m h N - r = m 2 , p - pi, q = 1 - Pi = C(iV, r) = AM /m x ! m 2 !. 

If v is the number of spots appearing on the top face in a throw of a 
die, then v will take on one of the values 1, 2, 3, 4, 5, 6, and the prob¬ 
ability of throwing exactly r aces (say) in N throws of the die is 

C{N, r)(|)-(t)^. 

We recall that (2) is the general term of the expansion of the 
binomial ( q + p) N . By using Stirling’s approximation for factorials, 
we can derive an approximation for (1) which will bear to the multi¬ 
nomial law a relation analogous to that which the normal curve bears 
to the binomial. With this objective in mind, assume that every m» 
is sufficiently large for ! to be replaced by its Stirling approxima¬ 
tion. Making these replacements (1) becomes, after some algebraic 
rearrangement, 

n (Wp i /m i )“i +1/2 

(3) /(mi, m 2 , • • • m*) = ( 27 rA0 ( * -1)/2 (piP2 * • Pk) m ’ 


Next introduce the transformation 


(4) 


U = 


m< — Npi 

<Ti 


<Ti 2 being Npi( 1 — Pi). Under this transformation (3) becomes 

k ( <r/A 

(2tN) • • • p*) 1 '*/ = .n (i + 

Then 

log L.M. = L (~Np< - ati - |) log (l + 

where L.M. denotes the left-hand member of the preceding equation. 
Upon expanding the logarithm in a power series and collecting the 
results according to descending powers of N , we obtain 

k / 0 . , 2 £ .2 \ 

log L.M. = -£ + 2^“. + terms of lower ordei y • 



166 Mathematics of Statistics 

Let each m< in 52 w < = N be transformed in accordance with (4). 
The result is 

k k 

+ Njjp< = N, 

1 i 

whence it follows that 52 <nU = 0 since 52 p* = 1. Therefore, 
remembering the value of o-,- 2 , (3) may be written 

/(mi, m 2 , • • • m*) = (2 irNY^'Kp^ • • • 

The form of the exponent of e suggests the substitution of a new 
variable a\- = <<(1 - p<) 1/2 in place of Upon making this substitution 
we have 

(5) /(mi, m 2 , • • • m*) = (27riV) (1 -* )/2 (p x p 2 • • • p*)- 1/2 e -i f * <st 

where = (m< - Np l ){Np i )~ 1/2 and Wp,- is the mean or expected 
value of m,-. 

Now, following Wilks, 2 the x’s are independent except for the single 
linear restriction £(Wpi) 1/2 z< = 0. Let 72 be the region in the x- 

space subject to the linear restriction just given corresponding to any 
region R m in the m-space. Since the m’s are always integers, the 
change in x< corresponding to a change of unity in m t - is (Npi )- 112 = 
AXi. Treating k - 1 of the x’s, say x h x 2 , • • • x*_i as the independent 
variables, and using an extension of the fundamental theorem on the 
existence of a definite integral (Riemann), we have 

Jim 52/(x i, **,••• %k) AxiAx 2 • • • Ax/t_i = 

(6) (2t) (fc_l,/2 (pfc) l/2 ^ dx 

where for a given iST, ^2 denotes the summation over all points in the 

region R corresponding to those in R m for which f(m h m 2 , • • • m k ) 
is defined. The integral is fc-dimensional, and dx = dx x dx% • • • dx k -\. 
2. The X 2 Distribution. The quantity 

(7) 5> = X 2 

i 

is used as an index of the extent to which the set of m’s taken as a 
whole cluster about their respective expected values. Later on we 
will explain the practical import of this index. For the present we 



The X 2 Distribution and Applications 167 


confine our attention to the purely mathematical problem of finding 
the distribution function of x 2 - First, we consider the problem of 
finding the distribution function of x- To this end we observe that, 
corresponding to different values of x, (7) defines a set of ^-dimen¬ 
sional hyperspheres all having their centers at the origin of the xt -axes 
and no two intersecting. Now we can obtain the distribution of x 
by determining the value of the integral in (6) when R consists of the 
region bounded by the concentric hyperspheres 

k * 

(8) X>i 2 = x 2 and 2>* 2 = (x + <*x) 2 

i i 

subject to the condition that 

(9) 'ttNp i ) ll2 x i = 0. 

Since this last equation is a hyperplane through the common center 
of the hyperspheres, the region R is therefore a “ shell of a k — 1 
hypersphere. Within this shell 



to within terms of order dx. 

Now it can be shown that the volume V of an s-dimensional hyper¬ 
sphere of radius r is 

V = Cr* 


where C is independent of r. The volume between two concentric 
hyperspheres of radii r and r + dr is therefore approximately 

(10) dV = C'r* -1 dr. 

Returning to the x problem, it is clear from (10) that if the region 
bounded by the hyperspheres in (8), subject to the restriction given 
by (9), is chosen as the element of volume, then the probability that 

j 1/2 lie in the interval from x to x + dx is 

(11) df = Ke ~ ix2 x*- 2 d x . 

Here K is independent of x and can be determined by the condition 



Using the Gamma function, we find 




'k - F 



168 


Mathematics of Statistics 


The distribution of x 2 is thus given by 

(Y 2 1 (*-3)/2g-ix s 

(12) TUCx 2 ) d(x 2 ) = —— 7. 1 d(x*) • 

2 (*-D/«r r—± j 

The number A: — 1 is the number of degrees of freedom which is the 
number of x’s which are independent in (6). 

3. Tables. The probability of obtaining a sample of z’s for which 
Z** is greater than an assigned x 2 , say xo 2 , is given by 

( 13 ) P(x 2 > Xo 2 ) = fV l(X 2 ) d( X 2 ). 

•'XO 2 

The symbol on the left in (13) may be abbreviated to P when there is 
no ambiguity. It is obvious that x 2 is never negative and may vary 
from 0 (when there is no difference between the observed and ex¬ 
pected frequencies) to very large values. As x 2 increases from 0 to 
oo, the probability P given by (13) decreases from 1 to 0. The stu¬ 
dent will recognize T k _ i (x 2 ) as a Pearson Type III curve and the 
integral in (13) as essentially an incomplete Gamma function. Values 
of P can be found in Pearson’s Tables 3 and we have included in the 
Appendix (see Table III) a short table, from Fisher’s book, 4 giving 
values of x 2 corresponding to specially selected values of P. In our 
table, n = k — 1. 

For fairly large values of Jc, (2x 2 ) 1/2 is approximately normally 
distributed about a mean (2k — 1) 1/2 with unit standard deviation. 
Therefore, one may refer 

t = (2 X 2 ) 1/2 - (2k - l) 1 ' 2 

to a normal probability scale when k > 30. 

4. Applications. The x 2 -test was designed by its originator, 
Karl Pearson, 5 as a criterion for testing hypotheses about frequency 
distributions. These hypotheses may be classified into two types 
which we will call simple and composite. We are making an explicit 
distinction between them and considering them separately to avoid 
certain misunderstandings which have sometimes occurred, in the 
past, in the applications of the test. To be more specific, there has 
been, as a result of confounding hypotheses to be tested, some contro¬ 
versy over the appropriate number of degrees of freedom to use in 
entering the tables for P(x 2 > xo 2 ). 



The x 2 Distribution and Applications 169 

Simple Hypothesis. Under this heading we will consider those 
cases in which the theoretical frequencies are known a priori , that is, 
when they are not inferred in any way from the sample. 

Suppose that we have a set of k observed frequencies 


mi + m 2 + • • • + m k = N 

constituting a sample from a hypothetical universe (supposedly 
infinite) in which the relative frequencies in the k categories are 
known to be pi, p 2 , • • •, Pk, respectively, where = rhi/N. Then, 
corresponding to the observed frequencies, we have a set of k theoreti¬ 
cal frequencies such that 

mi + m 2 + • • • m k = N. 

An example would be, for the m’s, the frequency of heads obtained in 
tossing N coins k times, and, for the m’ s, the corresponding theoretical 
frequencies given by the terms in the expansion of the binomial 
iV(| + h) k - In comparing the observed and theoretical frequencies 
a question quite naturally arises as to whether the aggregate discrep¬ 
ancy between them could be explained on the basis of chance 
fluctuations under the hypothesis that \ is the probability of success 
in each trial. More generally, we are interested in such a question 
as the following. On the hypothesis that an observed distribution is 
a random sample from a proposed universe, what is the probability 
that, taken as a whole, the discrepancy between theory and observa¬ 
tion would yield a value of x 2 as large as, or larger than, the value 
obtained. The hypothesis is to be rejected whenever the probability 
is considered “ small.” 

If we let Xi — (nii — mi)/Vmi it is clear that the x , s are subject to 
the linear homogeneous restriction given by (8) with n = k — 1 
degrees of freedom because, if k — 1 of the x’s are fixed, the fcth is 
determined. In the case of a simple hypothesis, then, Fisher’s table 
of P is to be entered with n = k — 1. 

With regard to levels of significance, Fisher 4 says: 

In preparing this table we have borne in mind that in practice we do not want 
to know the exact value of P for any observed x 2 > but, in the first place, whether 
or not the observed value is open to suspicion. If P is between .1 and .9 there 
is certainly no reason to suspect the hypothesis tested. If it is below .02 it is 
str ongl y indicated that the hypothesis fails to account for the whole of the facts. 
We shall not often be astray if we draw a conventional line at .05, and consider 
that higher values of x 2 indicate a real discrepancy. 



170 


Mathematics of Statistics 


Composite Hypothesis. In the majority of practical cases, the 
frequencies are not known a priori and must be estimated from the 
sample. Thus, in a graduation by means of the normal curve the 
theoretical frequencies are obtained by imposing the conditions that 
the universe has the same mean and standard deviation as the sample. 
The x 2 -test can be accurately applied only if allowance is made for 
the number of parameters which are determined from the sample in 
reconstructing the universe. Suppose there are q parameters in the 
function representing the universe and these are to be determined 
from the sample by the principle of moments. Since any moment is 
a linear function of the frequencies (it will be remembered that the 
frequencies are the variables in this discussion), the determination 
of the q parameters involves q linear restrictions. We have seen in 
§ 2 that the restriction imposed by (9) reduced our problem from a 
space of k dimensions to a space of k — 1 dimensions. Quite analo¬ 
gously, q additional linear restrictions reduce the space to k — 1 — q 
dimensions. Accordingly,* in testing divergence from a universe 
specified by a function f(v, a, h, c, • • •) where v is the variable of the 
distribution and a, b, c, • • • are q disposable parameters which are 
to be estimated from the sample, the number of degrees of freedom 
with which to enter the tables of P is n = k — 1 — q. 

The following two conditions should be fulfilled in applying the 
X 2 -test (for both simple and composite hypotheses). 

1. No class should contain very few items because, in the deriva¬ 
tion of (11), it was assumed that ra t - was sufficiently large to replace 
m, ! by its Stirling approximation. 

2. The number of classes should not be very large since it can be 
shown, by expanding the integrand in (13) into a power series, that 
P —> 1 as A; —».oo. 

We shall interpret, somewhat arbitrarily, these conditions to mean 
that P cannot be guaranteed when m < 5 and k > 20. To satisfy 
the first condition, it is customary to lump together the small fre¬ 
quencies at the ends of the distribution. 

Example 1. Twelve dice were thrown 4096 times; only a throw of six was 
counted a success. The expected frequencies are given by 4096(£ + f). 12 
How improbable, taken as a whole, is the observed distribution shown in 
Table 18? 

* Strictly speaking, the determination of the parameters by the method of 
moments does not lead to a system of equations which are exactly analogous 
to (9). 



The x 2 Distribution and Applications 171 


Table 18 


Number of 
Successes 

Observed 

Frequency 

Theoretical 

Frequency 

(to — to) 2 

(to — to) 2 

TO 

0 

447 

459 

144 

.3137 

1 

1145 

1103 

1764 

1.5993 

2 

1181 

1213 

1024 

.8442 

3 

796 

809 

169 

.2089 

4 

380 

364 

256 

.7033 

5 

115 

116 

1 

.0086 

6 

24 

27 

9 

.3333 

7 and over 

8 

5 

9 

1.8000 

Totals 

4096 

4096 


x 2 = 5.8113 


Entering Table III (see Appendix) with n = 8-1 = 7, and interpolating for 
the value of P corresponding to the observed value of x 2 = 5.8113, we find 
P = .56. Hence there is no reason to reject the hypothesis that the underlying 
chance of a “ success ” is p = g. That is, there is no reason to suspect that the 
dice were biased. 

Example 2. An observed distribution was graduated by means of the normal 
curve (see Part I, p. 123) with the results shown in Table 19. Test the hypoth¬ 
esis that the observed distribution, was a sample from a normal universe with 
mean and standard deviation equal respectively to those of the sample. 


Table 19 


Central 

Observed 

— 

Theoretical 

Values 

Frequency 

Frequency 

29.5 

33.5 

16 {u 


37.5 

56 

60.2 

41.5 

172 

155.4 

45.5 

245 

252.6 

49.5 

263 

258.8 

53.5 

156 

167.2 

57.5 

67 

68.0 

61.5 

65.5 


-spa 7 :! 

Totals 

1000 

1000.0 


It is found that x 2 = 4.82. After pooling the end frequencies, as shown, k = 8. 
So entering Table III for n = 8 — 1 - 2 = 5, we find that P > .4. Hence the 
X 2 -test does not reject the hypothesis. 

For applications of the x 2 -test to contingency tables the reader is referred to 
Fisher’s book. 




172 


Mathematics of Statistics 


B. STATISTICAL INFERENCE 


5. Induction versus Deduction. To contrast the inductive prob¬ 
lems, which we are about to consider, with deductive problems, we 
shall review briefly a deductive type of argument which we have 
previously discussed. Suppose D(t ) is the distribution function of 
a statistic t computed from a sample from a universe specified with 


respect to functional form and parameters. 


Then j* D(t ) dt gives 


the probability that an observed value of t will not exceed an assigned 
value of 8. Thus in Chapter VI we learned that the means of 
samples cluster about the mean of the universe, and Theorem X 
of that chapter gave us the probability that a sample mean would 
have a numerical value within 8 of the mean of the universe. This 
is a deductive argument. Presently we shall consider certain inverse 
problems which arise in arguing from samples and their statistics 
back to universes and their parameters. First, however, we shall 
examine Bayes’ Theorem. The following quotation from R. A. 
Fisher 6 will serve as a setting for our consideration of this theorem. 


Thomas Bayes’ paper of 1763 was the first attempt known to us to rationalize 
the process of inductive reasoning. From time immemorial, of course, men had 
reasoned inductively; sometimes, no doubt, well, and sometimes badly, but 
the un certainty of all such inferences from the particular to the general had 
seemed to cast a logical doubt on the whole process. By the middle of the 
eighteenth century, however, experimental science had taken its first strides, and 
all the learned world was conscious of the effort to enlarge knowledge by experi¬ 
ment, or by carefully planned observation. To such an age the limitations of a 
purely deductive logic were intolerable. Yet it seemed that mathematicians 
were willing to admit the cogency only of purely deductive reasoning. From 
an exact hypothesis, well defined in every detail, they were prepared to reason 
with precision as to its various particular consequences. But, faced with a 
finite, though representative, sample of observations, they could make no rigor¬ 
ous statements about the population from which the sample had been drawn. 

Bayes perceived the fundamental importance of this problem and framed an 
axiom, which, if its truth were granted, would suffice to bring this large class of 
inductive inferences within the domain of the theory of probability; so that, 
after a sample had been observed, statements about the population cpuld be 
made, uncertain inferences, indeed, but having the well-defined type of un¬ 
certainty characteristic of statements of probability. Bayes’ technique in this 
feat is ingenious. His predecessors had supplied adequate methods, given a 
well-defined population, for stating the probability that any particular type of 
population might have given rise to it. He imagines, in effect, that the possible 
types of population have themselves been drawn, as samples, from a super¬ 
population, and his axiom defines this super-population with exactitude. His 



Statistical Inference 173 


problem thus becomes a purely deductive one to which familiar methods were 
applicable. 

6. Bayes’ Theorem. To derive Bayes’ theorem, consider a bi¬ 
variate universe of discrete variables in which x takes the values 
x ly Xi, • • • , X n , and y the values y h yi, • • • , y m . Let P(x iy yj) rep¬ 
resent the probability for the joint occurrence of (x i} yj). Let 
P( V i | Xi ) be the probability that y takes the value y 7 when it is 
known that x has taken the value x,. Then 


(14) 


P(.Vi I xj) 


P(*i, Vi) ? 

g{x<) 


where g(xj) = Z P{xi, yj) is the marginal distribution of x in the 

bivariate universe and represents the a priori probability that x takes 
the value x t . Let us write (14) in the form 

(15) P(xi, yj) = g(xj)P(yi ] xj). 


By a similar argument we may write 

(16) Pfa, yj) = h(yj)P(Xi | yj), 


where h(yj) = £>(*•> yj) is the mar gi nal distribution of y, and 

P(Xi | yj) is the probability that x = x, when it is known that y = yj. 
It is clear from preceeding relations that 


(17) 


h{yj) = Z g(,Xi)P(yj \ xj). 


Since P(x i9 yj) means exactly the same thing in (15) and (16) we may 
equate their right members and solve for P(x, | yj). The result is 


( 18 ) 


P(Xi | yd 


g(xj)P(j/i I xj) 
Hyj) 


This is Bayes’ theorem and it may be stated as follows. 

Bayes’ Theorem. The probability that x - Xj when y = y 7 is equal 
to the product of the probabilities that x = Xi, and that y — y 7 when 
x = x it divided by the probability that y = y,. 

The theorem is usually expressed symbolically in the somewhat 
different form to which it reduces when (17) is substituted for the 



174 


Mathematics of Statistics 


denominator of (18). This form is 


(19) 


Jj!(xi)P(yj | Xi) 


To connect Bayes’ theorem with a posteriori, or inverse, proba¬ 
bility suppose in (19) that the x’s denote certain initial situations and 
the y’s denote events subsequently observed. The a priori proba¬ 
bility for the existence (occurrence) of the initial situation char¬ 
acterized by Xi is g(xi). P(yj \ Xi) is the a priori probability that y, 
will occur when x t exists. Then (19) gives the a posteriori proba¬ 
bility that the tth initial situation has produced the observed event 
specified by y,-. 

The following examples will clarify the theorem and serve to focus 
attention on its weakness. The first example, a somewhat artificial 
one, is designed to illustrate a situation where the existence proba¬ 
bilities g{xi ) are equal. The second will describe a situation when 
nothing is known about them. 


Example 3. (Molina 1 ) During his sophomore year Tom Smith played on 
both the baseball and football teams; we have been informed that he broke his 
ankle in one of the games; what are the a posteriori probabilities in favor of 
baseball and football, respectively, as the baneful cause of the accident? Evi¬ 
dently the answer depends on the number of baseball and football games played 
during their respective seasons and also on the likelihood of a man breaking an 
ankle in one or the other of these two sports. As a concrete case assume that: 

(а) At Smith’s college an equal number of baseball and football games are 
played per season; 

(б) Statistical records indicate that if a student participates in a baseball 
game the probability is tib that he will break an ankle and that, likewise, the 
probability is for the same contingency in a football game. 

Solution. Associate xi and x 2 with the admissible causes, baseball and foot¬ 
ball, respectively. Associate y x with the accident. From condition (a) of the 
problem, the existence probability for baseball is g(xi) — £. Also P(yi | xi) 
= T#?y> and P(yi | £ 2 ) = rib- From (19), then, the a posteriori probability for 
baseball is 


P(x 1 | yi) 


1 JL 

2 * 100 

1 _2_ 1 7 

2'100 + 2 ’ 100 


2 

9 * 


It follows that the u posteriori probability in favor of football is i. 

Example 4. An urn contains five balls, black, white, or both kinds. Of three 
balls drawn together and at random (each ball within the urn is equally likely to 



Statistical Inference 


175 


be drawn), two are black and one is white. What is the probability that the 
urn contains three black and two white balls? 

Solution. Associate Xi, x 2 • • •, x 6 , with the possible compositions of the urn 
before the drawing was made, namely, OB, 5 W', IB, 4 W; • • ■ ; SB, 0 W. Associate 
y u y ti y h y t , With the possible compositions in the drawing of three balls, namely, 
OB, 3 W; IB, 2 W; 2 B, 1 W; SB, 0 W. The composition corresponding to y 3 
was obtained and we seek the probability that it came from an urn with composi¬ 
tion specified by x 4 . That is, we seek P(x 4 | y»). Clearly, 


PiV* I *4) 


C(3, 2)C(2, 1) 3 

C(5, 3) 5 ’ 


so from (19) we have 


( 20 ) 


P(x 4 I y 3 ) ■■ 


3 

sW j' 


Y,g(xi) 


C(i, 1)C(5 - i, 2) 
C(5, 3) 


it being understood, of course, that C(n, r) — 0 when n < r. 

Since the values of g(x t ) are unknown the problem does not have 
a unique solution. Moreover, if they were known we would be 
back in the domain of deductive probabilities again since all the 
probabilities in the right-hand member of (20) would then be known 
a priori. It is only when g(xi) are unknown that we are properly 
in the domain of a posteriori probability. In practical problems 
the g{xi) are scarcely ever known. 

Bayes realized this and argued that the x’s may be considered 
equally probable unless we have some reason to think they are not. 
Under this “ doctrine of insufficient reason,” the x’s are assumed to 
have equal existence probabilities. In this case, g(xi) = constant 
and would cancel in (19), thus permitting a definite solution in (20). 
It appears that Bayes had serious doubts about this “ doctrine ” for 
he withheld his entire treatise from publication until his doubts 
should be resolved, and it was only after his death that his paper 
was published by friends. Laplace, however, was less cautious, and 
he incorporated the doubtful theorem into his Theorie Analytique des 
ProbabiliUs. Robed in the authority of Laplace it went unques¬ 
tioned for a long time. Boole was the first, in 1854, to criticize the 
assumption of “ the equal distribution of our knowledge, or rather 
of our ignorance ” and “ the assigning to different states of things of 
which we know nothing, equal degrees of probability.” Today, it 
is well known that the assumption of constant existence probabilities 
may lead to mathematical contradictions. This may clearly be seen 



176 


Mathematics of Statistics 


in the analogue to (19) for continuous variables. The following 
illustration of such a contradiction is cited by Wilks ( loc . cit.). 

Let 0 be a parameter characterizing the universe and t a statistic 
from the sample. Then the analogue to (19) for the continuous 
case is 


( 21 ) 


F(e 1 1) de = 


g(0)f(t | d) dd dt 
dtfg(0)f(t | 6) de 


Now, if according to the “ doctrine of insufficient reason 
assume g(0) to be constant, (21) reduces to ' 


( 22 ) 


F(e 1 1) de = 


fit | o) de 
ff(t \o) dd' 


we may 


But by the very nature of this “ doctrine ” there is no more reason 
to assume the a 'priori probability function of 6 to be constant than 
there is to assume the a priori probability distribution of some 
function of 0, say 0 2 , to be constant. The a priori distribution 
of 0 2 = z is g(Vz)/2Vz. If g(Vz)/2Vz is constant, then 


F(0 \t) dd = 


efjt | 6) dd 

fdfit | d) dd 


which is certainly inconsistent with (22). 

In arguing from a sample to the universe, any inference must be 
attended with some degree of uncertainty. But uncertainty should 
not be confused with lack of rigor. As we shall see, statements can 
be made about population parameters, subject to risks of being 
wrong, where the error is precisely expressed in terms of probability 
theory. In other words, the nature and degree of the uncertainty 
can be rigorously expressed. This can be accomplished without any 
assumptions regarding the a priori existence probabilities. 

7. Probable Error. The following concise exposition of the various 
usages of the term “ probable error ” is due to Professor A. T. Craig. 

There are in the literature three conceptions of the probable error. 
If, purely for convenience of language, we refer to the probable error 
of the mean, these conceptions can be stated as follows: (i) The 
probable error of the mean is that deviation, extended on both sides 
of the mean of the population, such that \ is the probability that the 
mean of a sample will fall in this interval; (ii) The probable error of 
a mean is that deviation, extended on both sides of the mean of a 



Statistical Inference 


177 


sample., such that } is the probability that the mean of the population 
lies in this interval; (iii) The probable error of the mean is that devia¬ 
tion, extended on both sides of the mean of a sample, such that * is 
the probability that the mean of another sample will fall m this inter¬ 
val. Conception (i) leads without difficulty to the usual formula 
.6745 (ff/y/N) for the probable error of the mean. This formula is 
rigorously correct for samples of any size drawn from a normal popu¬ 
lation and is valid for large samples drawn from any population with 
finite variance. On the other hand, the formula cannot be estab¬ 
lished under conception (ii) without further assumptions. If, before 
the sample is drawn, it is assumed, in the absence of any knowledge 
concerning the distribution of possible values of the mean of the 
population, that the existence distribution is constant, then the 
formula admits mathematical proof. But this assumption is essen¬ 
tially the same assumption as that made in applying Bayes’ Theorem 
to problems of probability a posteriori. 

The modern method of expressing the reliability of a statistical 
estimate of a population parameter in terms of fiducial limits seems 
likely to replace the traditional but often misleading mode of expres¬ 
sion involving probable error. The rest of the chapter is devoted to 
this recent advance in statistical inference. 

8. Fiducial Theory. The material of this section is reproduced 
from a recent paper on this subject by Rietz. 8 

In explaining the meaning of the probable error of a statistic, one 
of the usual types of definition is essentially the following: The 
probable error of a statistic, t, is a positive number, E t , such that the 
chances are even that the population parameter of which t is an estimate 
from the sample, will fall within the interval t - E t to t + E t . 

This definition contains an inference about the values of a popula¬ 
tion parameter on the basis of information obtained from a random 
sample drawn from the population. 

Formulas for E t , in terms of observed data, when t may represent 
any one of a considerable number of statistics, say an arithmetic 
mean or a correlation coefficient, are usually listed for convenient 
application in numerous textbooks for teaching courses in sta- 

tlS Under the definition stated above, it is noteworthy that these for¬ 
mulas depend on a fundamental assumption whose validity has long 
been in doubt. The assumption in question is to the effect that 
initially, that is, before our drawings of a sample are made, in our 



178 


Mathematics of Statistics 


lack of knowledge about the distribution of possible values of an 
unknown parameter, say of 0, we may assume the existence distribu¬ 
tion of 0 to be constant. 

The invalidity of this assumption in many applied problems of 
statistical interest may be seen clearly in cases of a continuous distri¬ 
bution function with a derivative. Suppose that our initial assump¬ 
tions relating to a parameter 0 were such that 0 would initially be 
distributed in accord with a continuous frequency function, g(d), 
which has a derivative at each point within its possible range on 0, 
say from 6 = a to 0 = 0. Next, suppose g(0) were restricted to be 
constant throughout the range of 0. Then it is well known that the 
distribution of a simple non-linear function of 0 would not be con¬ 
stant. For example, the distribution of z = 0 n (n 1, 0 real and 
non-negative) would not be constant, but would be distributed in 
accord with a frequency function (1 /n)z a ~ n ^ n . But if 0 is a popula¬ 
tion parameter, it seems fairly obvious that the logical character 
of our theory should usually, if not always, be such as to enable us to 
use a power of 0 as a parameter if we found it convenient to do so. 

The preceding introduction is designed to lead up to the important 
fact that, although in the usual statistical inquiry by sample, the 
true value of the population parameter 0 is unknown and remains 
unknown, there are cases in which precise statements can be made in 
terms of probabilities about the bounds within which a parameter 0 
lies without making an assumption about the initial distribution of the 
possible values of 0. It has been only about seven years since R. A. 
Fisher initiated some important ideas in this connection to which 
interesting contributions have been made by several mathematical 
statisticians. 9-11 

For simplicity, consider a case of a single parameter, 0, in which 
we know the frequency function of the statistic, t, to be given by an 
integrable function 

(23) y t = fit, 0), 

where the values of t obtained from observation may be assumed to 
be good estimates of 0. Suppose we know (23) in such form that it is 
possible to calculate a table of values of the probabilities that the 
statistic, t, will fall into an assigned interval selected on a possible 
range (a, b ) for any assigned value of 0 within the possible range 
(a, /3) of 0. 

Next, for illustration, select a positive number e, say e = .005, 



Statistical Inference 


179 


on which to base a certain level of confidence about values of 0 to be 
expressed in terms of probabilities. 

As our main problem may be clarified by a geometrical represen¬ 
tation, conceive of corresponding values of t and 0 obtained in an 
extensive statistical experiment as represented by rectangular coordi¬ 
nates within the rectangle bounded by lines t - a,t = b, 0 = a, 0 = ft. 
(Fig. 23.) 

Consider an arbitrary assignment for 0, say that 0 — 0' is the true 
value of 0. This gives the line AB (Fig. 23). Since the distribution 


of the statistic t is assumed to 
be known for each assigned 
value of 0, we may locate on 
the line AB two points, h and 
U (h ^ h) such that c is equal 
to the probability that t ob¬ 
tained from a random sample 
will yield a value of t less than 
or equal to t h and similarly e 
is the probability that such a 
sample will yield a value greater 
than or equal to h. Then we 
have an interval on A B from h 



Fig. 23 


to t 2 such that 1 - 2e is the probability that the random sample will 


yield a value within this interval. 

More formally stated, we may introduce a function Fit, 0) defined 
as the definite integral of f(t, 0) in (23) from t = a to t. That is, 


F{t, 6) = f'fit, 0) dt, 


for any arbitrarily assigned real value of 0 on its range from a to /3. 
Then 

F(a, 0) = 0, F(b, 0) = 1, F(h, 0') = €, F(< 2 , 0') = 1 - e, 

(0 < € < 1 ). 


By considering all possible assignments of 0, in its possible range 
(a, 0), the locus of our set of lower values of t, illustrated by t on the 
line AB, will give a continuous curve which we mark with C« in 
Figure 23, the subscript e being used to remind us that c is the proba¬ 
bility that a random value of t for 0 = 0' will fall below or at ti. 




180 


Mathematics of Statistics 


Similarly, our set of upper values of t, illustrated by t 2 on AB, give 
a curve which we mark with Ci_«. 

If t is a good estimate of 0, its value usually, if not always, increases 
with 0 for all possible values. Thus, we shall restrict our further 
considerations to cases in which we may assume that t increases as 0 
increases and vice versa. More precisely we are concerned with 
one-valued monotone increasing functions represented by the two 
curves marked C e and Ci_ t . The region bounded by these two curves 
and the lines 6 — a and 0 = /3 has been called by Neyman the con¬ 
fidence belt with confidence coefficient equal to 1 — 2c. 

Next, consider the set of points, (f, 0), that would be obtained in 
Figure 23 in carrying out an extensive statistical experiment for 
which we seek a degree of accuracy in the long run, indicated by the 
value we assign to c. Then it is fairly obvious that the confidence 
belt is so constructed that 1 — 2e is the expected relative fre¬ 
quency with which points, ( t , 0), will lie inside the confidence belt, 
and 2e is the expected relative frequency with which such points 
will lie outside the confidence, belt or on its boundary, whatever 
the nature of the initial distribution function of the parameter 0 
may be. 

Conceive of drawing a large number of sets of random samples 
of N items each from a population consisting either of an infinite 
supply or of a finite supply with replacements, and that one of these 
samples, taken at random, yields a value of t = to for a certain 
statistic, then the line t — t 0 parallel to the 0-axis would fail to inter¬ 
sect the boundaries of the confidence belt, in two points, in at most 
a small fractional part (less than 2e) of the total number of sets of 
drawings. Denote the ordinates of the points in which the line 
t = t 0 cuts the curves Ci_ e and C t by 0i and 0 2 , respectively (Figure 23). 
These boundary values of 0 are called fiducial limits of 0 that cor¬ 
respond to t = t 0 and the interval 0i to 0 2 is called the fiducial interval 
for t = k. It is important to emphasize that the statement that 
1 — 2e is the probability that a value of 0 taken at random will fall 
into the confidence belt is to be associated with the whole belt, 
that is, with results of repeated application of a sampling procedure 
to all values of t met with in an extensive statistical experiment, and 
not merely with an assigned t. The probability that (0, t) falls 
within the confidence belt may differ for different assignments of t, 
but in the long run of statistical experience, the expected relative 
frequency of points within the confidence belt is 1 — 2e. By choos- 



Statistical Inference 


181 


ing c to be small, the probability is nearly 1 that the parameter lies 
within the confidence belt. 

The theory of confidence belts and fiducial intervals finds its 
main application in the testing of a certain hypothesis for possible 
rejection under the assumption that it is true. Such an hypothesis 
has been termed a null hypothesis. If, for a given c, the null hypoth¬ 
esis is rejected due to the value of t found from the actual data, the 
value of t is said to be significant at the level of probability equal 
to 2e. On the other hand, a value of t from observed data which 
does not reject the null hypothesis is said to be non-significant. 

9. Fiducial Limits, (a) For the mean. Let x and s be the mean 
and standard deviation of a sample of N = n + 1 items drawn from 
a normal universe with unknown mean x. The problem is to deter¬ 
mine an interval surrounding x in which we may assume, with a 
certain degree of confidence, that x is contained. We learned in 
Chapter VII that the variable 

V n(x — x) 

(24 *-7- 

is distributed in accord with the F n (t) curve and that P — 1 — P n (t) 
has been tabulated for various values of t and n, where 

P n {t) = 2 f Q F n {t) dt. 

Therefore, for an assigned c and for an assigned value of n, (n < 30), 
we may obtain from the tables upper and lower critical values of t 
by solving the equation P ~ 2e. With these critical values we can 
determine from (24) the required interval surrounding x for the 
given value of e. It is conventional among certain workers to take 
e = .005 (or .025) since they wish to determine values of the estimates 
of x in an interval dividing hypotheses that will be rejected from 
those acceptable under a null hypothesis at the 1% (or 5%) level of 
significance. 

Suppose, then, that we make the claim 

( 26 ) A; 

and we desire the probability of an error in this statement to be not 
more than 2c = .01. Taking n = 15, for example, we find from 



182 


Mathematics of Statistics 


Table 12, Chapter VII, that t = ±2.947 when P = .01. 
have 


(x - x) 


±2.947s 

Vl5 


and the claim. 


= ±.76s 

x - .76s < x < x + .76s 


Then we 


will be correct 99% of the time. 

It is clear from the above procedure that our confidence in the 
fiducial limits x ± t t s/y/n is measured by the area under the F n (t) 
curve inside t = ±£«, that is, by P n (t f ). This means that if we 
could observe all possible samples, the proportion represented by 
P„(t t ) would yield values of x and s for which the claim (25) is true, 
while the remaining proportion, P = 1 — P n (t t ), would yield values 
of x and s for which the claim is false. 



Fig. 24 


If we were testing a hypothetical value of x we would say that 
x is not significant at the 1% level of significance if x has any value 
in the x ± t t s/y/n interval, e = .005. If x does not lie in this 
interval we say that x is significant at this level. 

Obviously, values of t satisfying the equation P = .01, that is, 
P n (t) = .99, vary with n. To avoid the trouble of entering a table 
we give an alternate method which is valid when the sample is not 
small. Recall that the variable 

_ (* ~ ~ 3 



Statistical Inference 


183 


is approximately normally distributed when N > 30. The area 
under the normal curve outside t = ±2.576 is .01. Therefore, the 
99 % fiducial range of x is then 

2.576s 

X dz — 

VN - 3 

and the range gets smaller as N increases. 

( 6 ) For the difference between two means. Let x\ and Si 2 be the 
observed mean and variance of a sample of N\ drawn from a normal 
universe with unknown mean X\ and let x 2 and S 2 2 be the observed 
mean and variance of a sample of Nz drawn from a normal universe 
with unknown mean x 2 . It is assumed that the two universes have 
a common variance a 2 . For brevity, let 



is distributed in accord with F n (t ) for n = N — 2. From (26), 
upper and lower fiducial values of w can be found by assigning to t 
the solutions of P n (t) = .99, that is, of P = . 01 . If the value 
w = 0 falls outside the fiducial interval thus established, the con¬ 
clusion is that the difference between the means is significant at the 
1% level. That is, w 5 ^ 0 and hence Xi 9 * x 2 . 

If the two samples are equal in number so that the variates can be 
paired in some manner we may compute (26) by a different method. 

Let N = Ni = N 2 , w = xi - x 2 , and compute w and £(w — w) 2 . 
Then 

^ w — w 
w — w 


VN - 1 



184 


Mathematics of Statistics 


w — w 


Z) (w - W) 2 

1 

1/2 

L N{N - 1) J 



The last expression is sometimes called Bessel’s Formula. 


Example 5. (Snedecor 12 ) Imagine a newly discovered apple, attractive in 
appearance, delicious in flavor, having apparently all the qualifications of suc¬ 
cess. It has been christened “ King.” Only its yielding capacities in various 
localities is yet to be tested. The following procedure is decided upon. King is 
planted adjacent to Standard in 15 orchards scattered about the region suitable 
for production. Years later, when the trees have matured, the yields are meas¬ 
ured and recorded in the following table where X\ refers to King, x 2 to Standard, 
and w = xi — x 2 . The yields are in bushels. 


Xi 

x 2 

w 

(w — w) 2 

13 

11 

2 

16 

12 

6 

6 

0 

10 

3 

7 

1 

6 

1 

5 

1 

13 

7 

6 

0 

15 

10 

5 

1 

19 

9 

10 

16 

10 

4 

6 

0 

11 

3 

8 

4 

11 

6 

5 

1 

13 

8 

5 

1 

9 

5 

4 

4 

14 

7 

7 

1 

12 

6 

6 

0 

12 

4 

8 

4 

Totals 

90 

50 


Substituting in (27) we get 

6 — w _ 6 — w 
1 = 50 I 1 / 2 “ .488 ’ 

L (15)(14) J 

Interpolating in Table III for n = 14 and checking the result in the more exten¬ 
sive table in Fisher’s text we find that P = .01 when t = 2.977. Then solving 
the equation 




Statistical Inference 


185 


we obtain w = 4.55 and w = 7.45. Since w = 0 is outside the interval from 
4.55 to 7.45, the observed value of w differs significantly from either value of the 
parameter. In other words, for these as well as for all values outside the fiducial 
interval 4.55-7.45, we would reject (at the 1% level of significance) the null 
hypothesis that there is no significant difference between the yields of the two 
varieties, insofar as their means provide a criterion of judgment. 

(c) For the variance. In (25) of Chapter VII we obtained the dis¬ 
tribution of s 2 which we will now write in the form 


H(s 2 ) ds 2 = 


( AT S 2\ (tf-3)/2 

_ 1 

2 W 

/2<7 2 WV - 1\1 
\NJ \ 2 ) 


If we let x 2 
N replacing 


= Ns 2 /a 2 we get the x 2 distribution given in (12) with 

k, 

e -i2/2( Y 2\(^-3)/2 

W) dx 2 = — —- 


2(^-»/*r 




dx 2 


That we should thus obtain (12) is more than a coincidence, because 
it turns out that Ns 2 /a 2 actually is x 2 for N observations made on a 
single magnitude. If now we let n = N — 1 we obtain the distribu¬ 
tion for n degrees of freedom, 


(28) 


Tn(x 2 ) dx 2 = 


** /2 (x 2 ) { ”~ 2)/2 



dx 2 . 


To determine the fiducial limits of a 2 we first observe from (3) of 

N 

Chapter VII that Ns 2 = nc 2 = £(xi - x) 2 , and therefore we may 
write x 2 = na 2 /a 2 . If now we make the claim 

— < <r 2 < — > 

X2 2 Xi 2 

where xi 2 and X2 2 are arbitrarily chosen constants (xi 2 < X2 2 ), then 
our “ measure of confidence ” in the correctness of this claim is given 
by Z«(xi 2 ) “ Z«(X2 2 ), where 

/„(x ! ) =fJ TJ x r > 

Values of I n (x 2 ) can be obtained from Pearson’s Tables * 



186 


Mathematics of Statistics 



For further study of fiducial inference and its applications to testing 
hypotheses, the reader is referred to the publications of Fisher, 9 
Neyman, 10 and Wilks. 11 

Notes and References 

1. In preparing §§ 1 and 2, the writer has derived much help from Prob¬ 

ability and Its Engineering Uses, Fry, D. Van Nostrand Co., and Sta¬ 
tistical Inference, 1936-37, Wilks, Edwards Brothers. 

2. This part of § 1 and practically all of § 2 are taken from Professor Wilks’ 

Lectures and reproduced with his permission. 

3. Tables for Statisticians and Biometricians. 

4. R. A. Fisher, Statistical Methods for Research Workers. 

5. Karl Pearson, On the Criterion that a Given Set of Deviations from the Probable 

in the Case of Correlated Variables is Such that Can Reasonably be Supposed 
to have Arisen from Random Sampling, Phil. Mag. 5th Series, vol. 50 
(1900), pp. 157-175. 

6. R. A. Fisher, Uncertain Inference, Proc. Amer. Acad. Arts and Sciences, 

vol. 71, no. 4, pp. 245-257. 

7. E. C. Molina, Bayes Theorem, Annals of Mathematical Statistics, vol. 2, 

no. 1, pp. 23-37. 

8. H. L. Rietz, On a Certain Advance in Statistical Inference, American Mathe¬ 

matical Monthly, vol. 45, pp. 149-158. 

9. R. A. Fisher, Inverse Probability, Proc. Camb. Phil. Soc., vol. 26, 1930, 

p. 528; Proc. Royal Soc., A, vol. 139, 1933, p. 343. 

10. J. Neyman, On Different Aspects of the Representative Method, the Method of 

Stratified Sampling, and the Method of Purposive Selection, Journal Royal 
Statistical Society, vol. 97, 1934, pp. 558-606. 

11. S. S. Wilks, (a) Lectures on Statistical Inference, 1936-1937. 

(6) Fiducial Distributions in Fiducial Inference, Annals of Mathematical 
Statistics, vol. IX, no. 4, pp. 272-280. (This is an expository paper.) 

12. G. W. Snedecor, Statistical Methods. Collegiate Press, Inc., Ames, Iowa. 



Exercises 

1. Read the following paper: The X 2 -Test of Significance, T. C. Fry, Journal of 

the American Statistical Association, vol. 33, pp. 513-525. (The three 
papers following Fry’s exposition are also recommended.) 

2. Toss seven coins 128 times and record the frequencies of heads. Apply the 

X 2 -test to the resulting distribution. 

3. Graduate an appropriate distribution in Part I by means of the normal 

curve and test the composite hypothesis that the observed distribution 
was a sample from a normal universe having the mean and standard 
deviation of the sample. 

4. Give a report on x 2 and contingency tables. 

5. ( Chrystal) A bag contains three balls, each of which is either white or 

black, all possible numbers of white being equally likely. Two at once 
are drawn at random and prove to be white. What is the probability 
that all of the balls are white? Ans. f. 

6. If, in Example 4, it is assumed that, initially, all possible numbers of white 

balls in the urn are equally likely, what is the solution? 

7. If N is large s how tha t the 95% fiducial range of x for a normal universe is 

2 ± 1.96/ViV - 3. 

8. Making use of the references cited prepare a report on fiducial mference. 


187 



Review Problem 


A question arose in a physical education class as to whether thirteen-year-old 
girls weigh, as a. rule, more than thirteen-year-old boys. Suppose you wished to 
make a thorough analysis of the data in the table below concerning weights of 
boys and girls aged thirteen. Describe the tests you might apply, the reasoning 
and assumptions underlying these, and the interpretation that might be placed 
on the results. 


Weight ( pounds ) 

Class Marks 

Frequency 

Boys 

Girls 

42.5 

1 

0 

48.5 

3 

1 

54.5 

9 

7 

60.5 

33 

37 

66.5 

65 

41 

72.5 

80 

59 

78.5 

72 

58 

84.5 

41 

48 

90.5 

27 

23 

96.5 

7 

26 

102.5 

4 

16 

108.5 

2 

5 

114.5 

1 

3 

120.5 

0 

2 

Totals 

345 

326 


The following points are suggested for discussion: 

(а) Is there a clear difference between the two distributions? How would 
you test this: from the means, from the variances, from the samples as a whole? 

(б) 32.3% of the boys and 26.4% of the girls have weights less than 69.5 
pounds. Is this difference significant? 

(c) Within what limits would you say that the mean and standard deviation 
in the population of thirteen-year-old boys (from which you have the sample of 
345) is almost certain to lie in each case? 

(d) Summarize your results. 


188 




APPENDIX 

Tables 

I. Ordinates and Areas of the Normal Curve. 

II. 5% and 1% Points for the Distribution of F. 

III. x 2 Probability Scale. 



Table I. Ordinates and Areas of the Normal Curve, < j >( t ) = 




t 

< Kt ) 

Sa'+Wdt 

t 

*(«) 

J ^ l 4 >( t)dt 

t 


So'tMdt 

.00 

.39894 

.00000 

.45 

.36053 

.17364 

.90 

.26609 

.31594 

.01 

.39892 

.00399 

.46 

.35889 

.17724 

.91 

.26369 

.31859 

.02 

.39886 

.00798 

.47 

.35723 

.18082 

.92 

.26129 

.32121 

.03 

.39876 

.01197 

.48 

.35553 

.18439 

.93 

.25888 

.32381 

.04 

.39862 

.01595 

.49 

.35381 

.18793 

.94 

.25647 

.32639 

.05 

.39844 

.01994 

.50 

.35207 

.19146 

.95 

.25406 

.32894 

.06 

.39822 

.02392 

.51 

.35029 

.19497 

.96 

.25164 

.33147 

.07 

.39797 

.02790 

.52 

.34849 

.19847 

.97 

.24923 

.33398 

.08 

.39767 

.03188 

.53 

.34667 

.20194 

.98 

.24681 

.33646 

.09 

.39733 

.03586 

.54 

.34482 

.20540 

.99 

.24439 

.33891 

.10 

.39695 

.03983 

.55 

.34294 

.20884 

1.00 

.24197 

.34134 

.11 

.39654 

.04380 

.56 

.34105 

.21226 

1.01 

.23955 | 

.34375 

.12 

.39608 

.04776 

.57 

.33912 

.21566 

1.02 

.23713 

.34614 

.13 

.39559 

.05172 

.58 

.33718 

.21904 

1.03 

.23471 

.34850 

.14 

.39505 

.05567 

.59 

.33521 

.22240 

1.04 

.23230 

.35083 

.15 

.39448 

.05962 

.60 

.33322 

.22575 

1.05 

.22988 

.35314 

.16 

.39387 

.06356 

.61 

.33121 

.22907 

1.06 

.22747 

.35543 

.17 

.39322 

.06749 

.62 

.32918 

.23237 

1.07 

.22506 

.35769 

.18 

.39253 

.07142 

.63 

.32713 

.23565 

1.08 

.22265 

.35993 

.19 

.39181 

.07535 

.64 

.32506 

.23891 

1.09 

.22025 

.36214 

.20 

.39104 

.07926 

.65 

.32297 

.24215 

1.10 

.21785 

.36433 

.21 

.39024 

.08317 

.66 

.32086 

.24537 

1.11 

.21546 

.36650 

.22 

.38940 

.08706 

.67 

.31874 

.24857 

1.12 

.21307 

.36864 

.23 

.38853 

.09095 

.68 

.31659 

.25175 

1.13 

.21069 

.37076 

.24 

.38762 

.09483 

.69 

.31443 

.25490 

1.14 

.20831 

.37286 

.25 

.38667 

.09871 

.70 

.31225 

.25804 

1.15 

.20594 

.37493 

.26 

.38568 

.10257 

.71 

.31006 

.26115 

1.16 

.20357 

.37698 

.27 

.38466 

.10642 

.72 

.30785 

.26424 

1.17 

.20121 

.37900 

.28 

.38361 

.11026 

.73 

.30563 

.26730 

1.18 

.19886 

.38100 

.29 

.38251 

.11409 

.74 

.30339 

.27035 

1.19 

.19652 

.38298 

.30 

.38139 

.11791 

.75 

.30114 

.27337 

1.20 

.19419 

.38493 

.31 

.38023 

.12172 

.76 

.29887 

.27637 

1.21 

.19186 

.38686 

.32 

.37903 

.12552 

.77 

.29659 

.27935 

1.22 

.18954 

.38877 

.33 

.37780 

.12930 

.78 

.29431 

.28230 

1.23 

.18724 

.39065 

.34 

.37654 

.13307 

.79 

.29200 

.28524 

1.24 

.18494 

.39251 

.35 

.37524 

.13683 

.80 

.28969 

.28814 

1.25 

.18265 

.39435 

.36 

.37391 

.14058 

.81 

.28737 

.29103 

1.26 

.18037 

.39617 

.37 

.37255 

.14431 

.82 

.28504 

.29389 

1.27 

.17810 

.39796 

.38 

.37115 

.14803 

.83 

.28269 

.29673 

1.28 

.17585 

.39973 

.39 

.36973 

.15173 

.84 

.28034 

.29955 

1.29 

.17360 

.40147 

.40 

.36827 

.15542 

.85 

.27798 

.30234 

1.30 

.17137 

.40320 

.41 

.36678 

.15910 

.86 

.27562 

.30511 

1.31 

.16915 

.40490 

.42 

.36526 

.16276 

.87 

.27324 

.30785 

1.32 

.16694 

.40658 

.43 

.36371 

.16640 

.88 

.27086 

.31057 

1.33 

.16474 

.40824 

.44 

.36213 

.17003 

.89 

.26848 

.31327 

1.34 

.16256 

.40988 


191 



Table I. 


Ordinates and Areas op the Normal Curve, = 


1 

Vfrr 


e-fin 


t 

4>tt) 


t 



t 

< Kt ) 


1.35 

.16038 

.41149 

1.80 

.07895 

.46407 

2.25 

.03174 

.48778 

1.36 

.15822 

.41309 

1.81 

.07754 

.46485 

2.26 

.03103 

.48809 

1.37 

.15608 

.41466 

1.82 

.07614 

.46562 

2.27 

.03034 

.48840 

1.38 

.15395 

.41621 

1.83 

.07477 

.46638 

2.28 

.02965 

.48870 

1.39 

.15183 

.41774 

1.84 

.07341 

.46712 

2.29 

.02898 

.48899 

1.40 

.14973 

.41924 

1.85 

.07206 

.46784 

2.30 

.02833 

.48928 

1.41 

.14764 

.42073 

1.86 

.07074 

.46856 

2.31 

.02768 

.48956 

1.42 

.14556 

.42220 

1.87 

.06943 

.46926 

2.32 

.02705 

.48983 

1.43 

.14350 

.42364 

1.88 

.06814 

.46995 

2.33 

.02643 

.49010 

1.44 

.14146 

.42507 

1.89 

.06687 

.47062 

2.34 

.02582 

.49036 

1.45 

.13943 

.42647 

1.90 

.06562 

.47128 

2.35 

.02522 

.49061 

1.46 

.13742 

.42786 

1.91 

.06439 

.47193 

2.36 

.02463 

.49086 

1.47 

.13542 

.42922 

1.92 

.06316 

.47257 

2.37 

.02406 

.49111 

1.48 

.13344 

.43056 

1.93 

.06195 

.47320 

2.38 

.02349 

.49134 

1.49 

.13147 

.43189 

1.94 

.06077 

.47381 

2.39 

.02294 

.49158 

1.50 

.12952 

.43319 

1.95 

.05959 

.47441 

2.40 

.02239 

.49180 

1.51 

.12758 

.43448 

1.96 

.05844 

.47500 

2.41 

.02186 

.49202 

1.52 

.12566 

.43574 

1.97 

.05730 

.47558 

2.42 

.02134 

.49224 

1.53 

.12376 

.43699 

1.98 

.05618 

.47615 

2.43 

.02083 

.49245 

1.54 

.12188 

.43822 

1.99 

.05508 

.47670 

2.44 

.02033 

.49266 

1.55 

.12001 

.43943 

2.00 

.05399 

.47725 

2.45 

.01984 

.49286 

1.56 

.11816 

.44062 

2.01 

.02592 

.47778 

2.46 

.01936 

.49305 

1.57 

.11632 

.44179 

2.02 

.05186 

.47831 

2.47 

.01889 

.49324 

1.58 

.11450 

.44295 

2.03 

.05082 

.47882 

2.48 

.01842 

.49343 

1.59 

.11270 

.44408 

2.04 

.04980 

.47932 

2.49 

.01797 

.49361 

1.60 

.11092 

.44520 

2.05 

.04879 

.47982 

2.50 

.01753 

.49379 

1.61 

.10915 

.44630 

2.06 

.04780 

.48030 

2.51 

.01709 

.49396 

1.62 

.10741 

.44738 

2.07 

.04682 

.48077 

2.52 

.01667 

.49413 

1.63 

.10567 

.44845 

2.08 

.04586 

.48124 

2.53 

.01625 

.49430 

1.64 

.10396 

.44950 

2.09 

.04491 

.48169 

2.54 

.01585 

.49446 

1.65 

.10226 

.45053 

2.10 

.04398 

.48214 

2.55 

.01545 

.49461 

1.66 

.10059 

.45154 

2.11 

.04307 

.48257 

2.56 

.01506 

.49477 

1.67 

.09893 

.45254 

2.12 

.04217 

.48300 

2.57 

.01468 

.49492 

1.68 

.09728 

.45352 

2.13 

.04128 

.48341 

2.58 

.01431 

.49506 

1.69 

.09566 

.45449 

2.14 

.04041 

.48382 

2.59 

.01394 

.49520 

1.70 

.09405 

.45543 

2.15 

.03955 

.48422 

2.60 

.01358 

.49534 

1.71 

.09246 

.45637 

2.16 

.03871 

.48461 

2.61 

.01323 

.49547 

1.72 

.09089 

.45728 

2.17 

.03788 

.48500 

2.62 

.01289 

.49560 

1.73 

.08933 

.45818 

2.18 

.03706 

.48537 

2.63 

.01256 

.49573 

1.74 

.08780 

.45907 

2.19 

.03626 

.48574 

2.64 

.01223 

.49585 

1.75 

. 08628 

.45994 

2.20 

.03547 

.48610 

2.65 

.01191 

.49598 

1.76 

.08478 

.46080 

2.21 

.03470 

.48645 

2.66 

.01160 

.49609 

1.77 

.08329 

.46164 

2.22 

.03394 

.48679 

2.67 

.01130 

.49621 

1.78 

.08183 

.46246 

2.23 

.03319 

.48713 

2.68 

.01100 

.49632 

1.79 

.08038 

.46327 

2.24 

. 03246 

.48745 

2.69 

.01071 

.49643 


192 



Table I. Ordinates and Areas of the Normal Curve, 4>(t) </2 


t 

4>{t) . 


t 

*(t) . 

r Q 

t 

4>(t) . 


2.70 

.01042 

.49653 

3.15 

.00279 

.49918 

3.60 

.00061 

.49984 

2 ’ 71 

.01014 

.49664 

3.16 

.00271 

.49921 

3.61 

.00059 

.49985 

2*72 

00987 

.49674 

3.17 

.00262 

.49924 

3.62 

.00057 

.49985 

2 73 

00961 

.49683 

3.18 

.00254 

.49926 

3.63 

.00055 

.49986 

2. 74 

.00935 

.49693 

3.19 

.00246 

.49929 

3.64 

.00053 

.49986 

2.75 

00909 

.49702 

3.20 

.00238 

.49391 

3.65 

.00051 

.49987 

2.76 

00885 

.49711 

3.21 

.00231 

.49934 

3.66 

.00049 

.49987 

2.77 

.00861 

.49720 

3.22 

.00224 

.49936 

3.67 

.00047 

.49988 

2 78 

00837 

.49728 

3.23 

.00216 

.49938 

3.68 

.00046 

.49988 

2^79 

.00814 

.49736 

3.24 

.00210 

.49940 

3.69 

.00044 

.49989 

2.80 

.00792 

.49744 

3.25 

.00203 

.49942 

3.70 

.00042 

.49989 

2^81 

.00770 

.49752 

3.26 

.00196 

.49944 

3.71 

.00041 

.49990 

2.82 

00748 

.49760 

3.27 

.00190 

.49946 

3.72 

.00039 

.49990 

2 83 

00727 

.49767 

3.28 

.00184 

.49948 

3.73 

.00038 

.49990 

2^84 

!00707 

.49774 

3.29 

.00178 

.49950 

3.74 

.00037 

.49991 

2.85 

.00687 

.49781 

3.30 

.00172 

.49952 

3.75 

.00035 

.49991 

2.86 

.00668 

.49788 

3.31 

.00167 

.49953 

3.76 

.00034 

.49992 

2.87 

.00649 

.49795 

3.32 

.00161 

.49955 

3.77 

.00033 

.49992 

2 88 

00631 

.49801 

3.33 

.00156 

.49957 

3.78 

.00031 

.49992 

2.89 

.00613 

.49807 

3.34 

.00151 

.49958 

3.79 

.00030 

.49992 

2.90 

.00595 

.49813 

3.35 

.00146 

.49960 

3.80 

.00029 

.49993 

2.91 

.00578 

.49819 

3.36 

.00141 

.49961 

3.81 

.00028 

.49993 

2.92 

.00562 

.49825 

3.37 

.00136 

.49962 

3.82 

.00027 

.49993 

2.93 

.00545 

.49831 

3.38 

.00132 

.49964 

3.83 

.00026 

.49994 

2.94 

.00530 

.49836 

3.39 

.00127 

.49965 

3.84 

.00025 

.49994 

2.95 

.00514 

.49841 

3.40 

.00123 

.49966 

3.85 

.00024 

.49994 

2.96 

.00499 

.49846 

3.41 

.00119 

.49968 

3.86 

.00023 

.49994 

2.97 

.00485 

.49851 

3.42 

.00115 

.49969 

3.87 

.00022 

.49995 

2.98 

.00471 

.49856 

3.43 

.00111 

.49970 

3.88 

.00021 

.49995 

2.99 

.00457 

.49861 

3.44 

.00107 

.49971 

3.89 

.00021 

.49995 

3.00 

.00443 

.49865 

3.45 

.00104 

.49972 

3.90 

.00020 

.49995 

3.01 

.00430 

.49869 

3.46 

.00100 

.49973 

3.91 

.00019 

.49995 

3.02 

.00417 

.49874 

► 3.47 

.00097 

.49974 

3.92 

.00018 

.49996 

3.03 

.00405 

.49878 

3.48 

.00094 

.49975 

3.93 

.00018 

.49996 

3.04 

.00393 

.49882 

3.49 

.00090 

.49976 

3.94 

.00017 

.49996 

3.05 

.00381 

.49886 

3.50 

.00087 

.49977 

3.95 

.00016 

.49996 

3.06 

.00370 

.49889 

3.51 

.00084 

.49978 

3.96 

.00016 

.49996 

3.07 

.00358 

.49893 

3.52 

.00081 

.49978 

3.97 

.00015 

.49996 

3.08 

.00348 

.49897 

3.53 

.00079 

.49979 

3.98 

.00014 

.49997 

3.09 

.00337 

.49900 

3.54 

.00076 

.49980 

3.99 

.00014 

.49997 

3.10 

.00327 

.49903 

3.55 

.00073 

.49981 




3.11 

.00317 

.49906 

3.56 

.00071 

.49981 




3.12 

.00307 

.49910 

3.57 

.00068 

.49982 




3.13 

.00298 

.49913 

3.58 

.00066 

.49983 




3.14 

.00288 

.49916 

3.59 

.00063 

.49983 





193 



Table II.* 5% (Roman Type) and 1% (Bold Face Type) Points fob the Distribution of 



194 


Reproduced from Statistical Methods by G. W. Snedecor by permission of the author and the publisher, Collegiate Press, Inc., 



























Table II. 5% (Roman Type) and 1% (Bold Face Type) Points for th e Distribution of F 

m degrees of freedom (for greater mean square)____ 


s 

u» 

s 

h- 

2 

2 

S 

5 

3 

8 

Z 

3 

8 

ss 

55 

gWJ 

S3 

35 

S3 

S3 

SS 

®M 

88 

85 

55 

S3 


ws 

m'm 

5 m 

i-I ri 

*■<©* 

iMci 

HfJ 

~M 


-CM 

i-m 


2S 

S3 

S3 

55 

ss 

SS 

S3 

38 

S3 

R8 

83 

82 

82 

cm 

cm 

ON 

**fS 

-<M 

*m 




*H« 




ss 

35 

S3 

S3 

S3 

ss 

53 

S3 

MM 

23 

85 

nS 

82 

cm 

cm 

N« 

»*M 

— M 

*m 


HN 

-M 

-MM 

-HM 

-CM 


? — 

35 

53 

S3 

S3 

33 

S3 

53 

39 

Sore 

S3 

P:R 

88 

m'm 

«N 

«N 

«« 

h« 

»m 

’- i(S 

Hli 

-MM 

-MM 





2S 

S3 

33 

83 

S3 

SS 

SS 

53 

35 

S3 

S3 

88 

m'm 

INM 

MM 

no 

mm 

-m 



i-m 

i-(« 

rtM 

-*M 

H M 


cog 

S3 

S3 

S3 

88 

S3 

SS 

S3 

83 

S3 

33 

WM 

mm 

cm 

MM 

mm 

N« 

ON 


— M 

-mm 

HN 

~m 


-M 

58 

m£* 

S3 

*m 

52 

88 

S3 

S3 

SS 

53 

S3 

53 

35 

mm 

mm 

m’m 

N« 

ON 

MM 


*m 


HN 

-CM 

*-»« 

-«M 

53 

S3 

S3 

2S 

H? 

53 

SR 

8R 

00 ts 

cm 


38 

SS 

SS 

NM 

mm 

e»M 

ei«o 

cm 

INM 

mm 

CCM 


HN 

end 

-CM 

-<M 

S3 

SR 

S3 

23 

2S 

«m(S 

hC 

SS 

SS 

ss 

88 

S3 

<§>o 

SS 


cm 

mm 

cm 

m'm 

MM 


MM 

cm 

MM 

-CM 

M 



S3 

S3 

S3 

25 

23 

23 

88 

53 

38 

§3 

88 

S3 

Cm 

mm' 

mm 

mm 

m'm 

ei« 

«« 

MM 

MM 

cm 

mm 

MM 

-CM 

33 

83 

«re 

25 

««s 

S-2 

oog 

22 

23 

23 

ss 

SS 

SR 

cm 

Cm 

mm 

cm 

m<o 

ore 

INM 

MM 

cm 

mm 

mm 


MM 

00 o 

33 

53 

S3 

85 

82 

82 

25 

2S 

25 

COM 

58 

23 

MM 

Cm 

mm 

cm 

m’m 

mm 

MM 

cm 

MM 

MM 

MM 

<N« 

MM 

ss 

33 

33 

S3 

S3 

3S 

22 

U5M 

82 

25 

COM 

23 

28 

cm 

cm 

INCO 

cm 

m'm 

ci« 

INM 

MM 

MM 

cire 

MM 

MM 

MM 

S3 

HIO 

IOKH 

33 

5S 

53 

33 

53 

88 

82 

82 

2S 

23 

COM 


cm 

mm 

cm 

mm 

mm 

<m'w 

MM 

mm" 

m’m 

MM 

m’m 

MM 


ss 

39 

39 

55 

S3 

S3 

38 

ss 

MM 

85 

drt 

MO 

cm 

mm 

mm 

cm 

mm 

CCM 

eiw 

MM 

MM 

MM 

MM 

m'm 

MM 

So 

ss 

38 

S3 

33 

33 

33 

53 

88 

SS 

SS 

85 

52 

cm 

cite 

mm 

cite 

mm 

mm 

IN CO 

MM 


MM 

MM 

MM 

MM 

o S 

ss 

8£ 

SR 


33 

3* 

38 

33 

85 

88 

38 

38 

m5 

C«5 

mm 

©ice 

m'm 

<NM 

IN CO 

MM 

MM* 

MM 

MM 

MM 

MM 

R8 

8S 

ss 

ss 

32 

SR 

8S 

33 

58 

3S 

38 

53 

35 

m'5 

MO 

m5 

INM 

mm 

mm 

mm 

MM 

MM 

MM 

MM 

MM 

MM 

S3 

ga 

S3 

OO 

So 

S3 

S3 

3 m 

SR 

S3 

39 

32 

5S 

' m5 


ci5 

«5 

ei© 

mm 

MM 

MM 

MM 

MM 

MM 

MM 

MM 

S3 

S3 

S3 

53 

58 

82 

R2 

S3 

S3 

33 

S3 

SS 

S3 

' m5 

m5 

cie 

ei5 

ei5 

ei5 

m5 

IN 5 

MM 

MM 

MM 

MM 

MM 

-cm 

§£ 

©R 

SS 

SS 

S3 

53 

35 

33 

S8 

8S 

82 

cfcj 

com 

eo5 

CO 5 

m5 

m5 

m’5 

IN 5 

MM 

m'5 

cm 

m'5 

m5 

m5 

S3 

S3 

33 

82 

SS 

2© 

23 

55 

83 

o§ 

SR 

§8 

S3 

CO'** COM) 

coin 

COM) 

coin 

M“» 

»5 

CO 5 

55 

eo^ 

CO 5 

IN 5 

IN 5 

S3 

S3 

SS 

S3 

So 

22 

38 

58 

3R 

58 

33 

85 

58 

CO so 

CO© 

eo© 

eOvO 

coo 

COM 

»• 

MM 


COlfi 

MM 

COM 

COM 

S3 

38 

SS 

33 

58 

82 

82 

COO 

S3 

38 

83 

mR 

3R 

ooo 


5#o 


o» 

5m 

5m 

MM 

MM 

~ 

” 

5t4 

5»> 

3 

3 

<0 

2 

2 

2 

"s’ 

5 

"' 8 



8 

"s 


195 







































































































Table II. 5% (Roman Type) and 1% (Bold Face Type) Points for the Distribution of F 

m decrees of freedom (for greater mean square)__ 


s 

3 

3 

3 

g 

8 

8 

04 

8 

§_ 

l_ 

1000 

8 

38 

38 

83 

23 

S3 

83 

S3 

38 

22 

28 

SS 

§5 

8S 



MM 

MM 

MM 

MM 

MM 

MM 

MM 

mm 

MM 

MM 

MM 

35 

28 


83 

S3 

82 

83 

83 

22 

82 

28 

2^ 

33 



MM 

MM 

MM 

MM 

MM 

MM 

MM 

mm 

MM 

MM 

MM 

3R 

3? 

33 

S3 

SS 

22 

5S 

S3 

83 

88 

82 

28 

28 



MM 

MM 

MM 

MM 

MM 

MM 

MM 

MM 

MM 

MM 

MM 

88 

SR 

25 


33 

33 

83 

33 

33 

83 

S3 

S8 

33 

mm 

mm 



MM 

MM 

MM 

MM 

MM 

mm 

MM 

MM 

MM 

S3 

32 

SR 

sa 

55 

oR 

35 

83 

23 

82 

25 

83 

83 


— 



MM 

MM 

MM 

MM 

MM 

M« 

MM 

MM 

MM 

SS 

SS 

S8 

35 

22 

SS 

SR 

33 

33 

S3 

82 

33 

22 


mm 



^3 

MM 

MM 

MM 

.MM 

mm 

MM 

MM 

mm 

ss 

SS 

22 

2* 

22 

35 

SR 

3R 

55 

SS 

35 

33 

22 

^'*i 

mm 

MM 

m‘m 

MM 

MM 

mm 

MM 

MM 

MM 

MM 

MM 

mm 

32 

£8 

S3 

SS 

S8 

§5 

2S 

SS 

22 

SR 

oS 

55 

r s . 

•TS 

^ri 


^*9 

MM 

mm 

MM 

MM 

MM 

MM 

MM 

mm 


oab 

R2 

OW 

gS 

5S 

23 

SS 

S3 

85 

28 

5S 

SS 

SR 

nfi 

*h*9 

«-«*9 

H ^ 

^*9 

w**9 

MM 

MM 

MM 

mm 

MM 

MM 


g* 

gs 


g2 

S2 

S3 

S3 

S3 

3S 

22 

S3 

88 

22 



**r% 


»**? 

MW 

*H*9 

MW 

W*9 

MM 

MM 

mm 



23 


22 

•SR 

R5 


09»O 

I^ph 

*9 

SS 

gs 

>c* 

oo 

SR 




MW 

M*m 

mW 

HN 

^*9 

•* *9 

*H*i 

f**9 


MM 

S3 

23 

S3 

28 

53 

22 

88 

RS 

RS 


R2 


S2 


-<W 


*■5*4 

*h*9 

»h*9 

^*9 

*h*9 

MW 

«H*9 

MW 


mW 

<£$ 

22 

C>S 

S3 

23 

S3 

32 

32 

22 

88 

88 

88 

gs 

^**9 

p^*9 

N*d' 



m'w 

Hf{ 

MW 

mW 

MW 

pH *9 

*-«*9 

r-«09 

S3 

SS 

S3 

ss 

SS 

S3 

S3 

SS 

82 

S3 

SR 

88 

SS 

*h*J 

WW 


MW 

»-«*9 

MW 

M« 

MW 

|M*9 

MW 

Hf9 

fH*9 

f-i*9 

§g 

S3 

S3 

S3 

r-* 

OJHJ 

S3 

SS 

S3 

83 

23 

82 

33 

88 

ci w 

ww 

PHCt 


*"«*9 

m'W 

~*9 

M W 

MW 

*4*9 

MW 

t-i*9 

mW 

SR 

8g 

38 

o® 

35 

85 

2S 

S3 

32 

S3 

83 

S3 

So 

ciw 

WW 

09*9 

ww 

ww 

M« 

^*9 

m’w 

MW 

*4*9 

*« 

HfJ 

mW 

28 

38 

22 

SR 

f-C. 

OC' 

©5 

ss 

oS 

83 

85 

SS 

SS 

ss 

ci ci 

ww 

cifi 

ei« 

ww 

ci« 

09*9 

eiw 

ciw 

»H*9 

MW 

MW 

mW 

S3 

28 

38 

28 

SS 

28 

88 

§R 

sa 

»o*i 

Ob 

oS 

83 

SS 

ci W 

ww 

w'w 

ciri 

ww 

w’w 

09*9 

w'w 

ciw 

09 09 

09*9 

w’w. 

ci w 

82 

552 

«« 

C9*< 

3S 

So 

SS 

28 

R8 

«w 

S3 

28 

ow 

85 

ww 


eiti 

w'w 

WW 

w'w 

09*9 

ciw 

ciw 

ciw 

ww 

w’w 

ciw 

35 

82 

23 

88 

88 

88 

88 

Rm 

83 

®3 

S3 

So 

w* 

w'w 


w'w 

ci W 

w'w 

w'w 

09*5 

w’w 

ciw 

ciw 

ciw 

ww 

ciw 

SC 

S3 

S3 

S3 

S3 

S3 

SS 

35 

33 

55 

88 

83 

8 w 

N« 

ww 

09*5 



w'w 

09*5 

ciw 

ciw 

ciw 

ww 

ww 

ciw 

RR 

22 

<©*> 

N«4 

c»m 

006 

(•O 

83 

88 

S3 

S3 

«® 

22 

SS 

SR 

o>o 

eio 

oiV 

cio 

cio 

cio 

09*5 

oi*5 

ww 

ciw 

w'w 

09*5 

ciw 

28 

33 

28 

22 

28 

58 

82 

o® 

SS 

Oh 

83 

83 

8S 

cow 

coo 

coo 


coo 

coo 

wo 

eo^ 

co + 

WO 

wo 

wo 

cio 

§2 

§2 

88 

SS 

SS 

S3 

S3 

SS 

SS 

88 

SS 

83 

35 

or* 

Ot> 



cow 

coo 


eo>o 

CQ>© 

C0 'O 



COO 

3 

~ 

"T" 

S 

o 

2 

T" 

•o 

09 

8 

T 

W 

1 

1000 

8 


197 












Table III. Table of x 2 Probability Scale (from R. A. Fisher’s Table) 


Degrees 

of 

Freedom 

Chance of Exceeding Given Valve of x 2 

Degrees 

of 

Freedom 

.50 

.30 

.20 

.10 

.05 

.02 

.01 

n 

Valves of x 2 


i 

.45 

1.07 

1.64 

2.71 

3.84 

5.41 

6.63 

1 

2 

1.39 

2.41 

3.22 

4.60 

5.99 

7.82 

9.21 

2 

3 

2.37 

3.66 

4.64 

6.25 

7.81 

9.84 

11.34 

3 

4 

3.36 

4.88 

5.99 

7.78 

9.49 

11.67 

13.28 

4 

5 

4.35 

6.06 

7.29 

9.24 

11.07 

13.39 

15.09 

5 

6 

5.35 

7.23 

8.56 

10.64 

12.59 

15.03 

16.81 

' 6 

7 

6.35 

8.38 

9.80 

12.02 

14.07 

16.62 

18.47 

7 

8 

7.34 

9.52 

11.03 

13.36 

15.51 

18.17 

20.09 

8 

9 

8.34 

10.66 

12.24 

14.68 

16.92 

19.68 

21.67 

9 

10 

9.34 

11.78 

13.44 

15.99 

18.31 

21.16 

23.21 

10 

11 

10.34 

12.90 

14.63 

17.27 

19.67 

22.62 

24.72 

11 

12. 

11.34 

14.01 

15.81 

18.55 

21.03 

24.05 

26.22 

12 

13 

12.34 

15.12 

16.98 

19.81 

22.36 

25.47 

24.69 

13 

14 

13.34 

16.22 

18.15 

21.06 

23.68 

26.87 

29.14 

14 

15 

14.34 

17.32 

19.31 

22.31 

25.00 

28.26 

30.58 

15 

16 

15.34 

18.42 

20.46 

23.54 

26.30 

29.63 

32.00 

16 

17 

16.34 

19.51 

21.61 

24.77 

27.59 

30.99 

33.41 

17 

18 

17.34 

20.60 

22.76 

25.99 

28.87 

32.35 

34.80 

18 

19 

18.34 

21.69 

23.90 

27.20 

30.14 

33.69 

36.19 

19 

20 

19.34 

22.77 

25.04 

28.41 

31.41 

35.02 

37.57 

20 

21 

20.34 

23.86 

26.17 

29.61 

32.67 

36.34 

38.93 

21 

22 

21.34 

24.94 

27.30 

30.81 

33.92 

37.66 

40.29 

22 

23 

22.34 

26.02 

28.43 

32.01 

35.17 

38.97 

41.64 

23 

24 

23.34 

27.10 

29.55 

33.20 

36.41 

40.27 

42.98 

24 

25 

24.34 

28.17 

30.67 

34.38 

37.65 

41.57 

44.31 

25 

26 

25.34 

29.25 

31.79 

35.56 

38.88 

42.86 

45.64 

26 

27 

26.34 

30.32 

32.91 

36.74 

40.11 

44.14 

46.96 

27 . 

28 

27.34 

31.39 

34.03 

37.82 

41.34 

45.42 

48.28 

28 

29 

28.34 

32.46 

35.14 

39.09 

42.56 

46.69 

49.59 

29 

30 

29.34 

33.53 

36.25 

40.26 

43.77 

47.96 

50.89 

30 


For larger values of n, V2x 2 — V2n — 1 may be referred approximately to 
normal probability scale. 


198 



INDEX 


Analysis of variance, 147 

for testing significance between 
means, 151 

for testing linearity of regres¬ 
sion, 153 
Array, 66 
mean of, 66 
variance of, 66 
Bartlett, 141 
Bayes’ theorem, 173 
Blakeman criterion, 155 
Bernoulli distribution, 10, 15, 16 
recursion formula for moments 
of, 15 

Bernoulli theorem, 9, 114 
Beta function, 39 
Binomial distribution, 9, see also 
Bernoulli, 

approximation, by normal 
curve, 19 

by Poisson exponential, 31 
Bivariate, distribution, see joint, 
normal surface, 70 
Brown, 29 
Camp, 24n, 75, 121 
Carver, 48, 112, 113 
Cauchy curve, 45 
Chi-square, distribution of, 166 
test of goodness of fit, 166 
tabular values of, correspond¬ 
ing to given probabilities, 
Appendix 
Church, 120 

Class frequencies, standard error 
and correlation of errors in, 
27 

Combinations and permutations, 
4 


Confidence or fiducial limits for 
difference between two means, 
183 

population mean, 181 
population variance, 185 
Contingency tables, reference to, 
171 

Coolidge, 5 

Craig, A. T., 15, 128, 149, 176 
Craig, C. C., 51 
Correlation coefficient, 65 
multiple, 87 
partial, 93 

testing significance of, 155 
total, 80 

transformation of, 159 
Correlation ratio eta, reference to 
in Part I, 154 

Correlated variables, definition 
of, 64 

Co-variance, 65 

Degrees of freedom, generally, 128 
in analysis of variance, 149 
chi-square test, 169,170 
correlation ratio, 154 
Fisher’s ^-distribution, 138 
testing difference between 
means, 140, 153 
testing linearity of regres¬ 
sion, 154 

unbiased estimates of popu¬ 
lation variance, 126 
Deming and Birge, 137,139 
De Moivre-Laplace theorem, 21n 
Dichotomy, 9, 164 
Difference, testing significance of, 
between correlation coef¬ 
ficients, 155 


199 



200 


Index 


means, 140, 151 
proportions, 119 
sample variances, 145 
Distributions, generally, 43 
Bernoulli, 10 
binomial, 9 
Cauchy, 45 
Fisher’s t-, 137 
Fisher’s z-, 142 
Gram-Charlier, 59 
joint, 63 
marginal, 64 
normal, 49, 55 
normal bivariate, 70 
of means, 133 
of standard deviations, 135 
of variances, 134 
Pearson Type III, 49 
Poisson exponential, 29 
“ Student’s,” 128 
Estimates, unbiased, 125 
Expected value, 
of mean, 102 

of standard deviation, 127, 135 
of variance, 123, 135 
propositions concerning, 100 
Fiducial inference, theory of, 177 
Fiducial limits, 181 
Fisher’s derivation of “Student’s” 
distribution, 130 
Fisher’s ^-distribution, 137 
table of, 138 

Fisher’s z-distribution, 142 
Frequency curves, generally, 43 
Gram-Charlier, 59 
Pearson system of, 46 
Frequency surface, the ( x , s)-, 
136 

normal, 70 

Fry, 5, 20n, 32, 39, 186 
Function, distribution, 43 
Beta, 39 
Gamma, 35 

incomplete Beta and Gamma, 41 


Gamma function, 35 
Geary, 134, 160 
Gram-Charlier series, 59 
Hermite polynomials, 59 
Homoscedastic arrays, 73, 92 
Hotelling, 2 

Hypergeometric series, 54 
Independence, definition of sta¬ 
tistical, 64 
Interaction, 149 
Irwin, 161 (ref. 22) 

Jackson, 95, 109 
Jacobian, 37 
Joint distributions, 63 
Large numbers, law of, 114 
Laplace, 175 
Levy and Roth, 5, 60n 
Limits, fiducial, 181 
Linear function, standard error 
of, 101 

Marginal distributions, 64 
Mathematical expectation, see 
expected value 
Means, 

test of significance of differ¬ 
ence between, 140 
testing variation in, 151 
Median, 61 
Mills, 162 
Molina, 174 

Moments, generally, 44, 64 
of Bernoulli distribution, 14 
of distribution of means from 
arbitrary universe, 102 
Multinomial law, 164 
Multiple correlation, 78 
coefficient, 87 
regression, 82 
Normal curve, 55 

connection with Gram-Charlier 
series, 59 

with Pearson system, 49 
normalized, 49 
quadrature of, 57 



Index 


201 


recursion formula for moments 
of, 56 

reproductive property of, 109 
Normal equations, 83 
Neyman, 160 

Null hypothesis, 116, 141, 181 
Parameters, unbiased estimates 
of population, 125 
Partial correlation, 91 
coefficient, 93 
Pearson, Karl, 41, 168 
Pearson, E. S., 116, 121, 127, 160 
Permutations and combinations, 

4 

Probability, 2, 51, 54 
a priori, 2, 174 
a posteriori, 2, 174 
distributions, 16, 43, 63 
inverse, 174 

scale of sampling fluctuations, 
115 

Probable error, 26,176 
Proportions, testing significance 
of difference between, 119 
Rider, 136n, 161 

Rietz, 3n, 5, 10, 18, 20n, 28n, 97, 
111, 141, 177 
Regression, 
curves, 66 
lines, 67 
multiple, 82 

systems in normal surface, 71 
testing linear, 153 
Repeated trials, 7 
Reproductive property of normal 
law, 109 

Romanovsky, 135, 136, 161 

Sample, 97 

size of, to have a given relia¬ 
bility, 118 
Salvosa, 51 
Shewhart, 103, 113 
Significance, rule for level of, 117, 
138 


Small samples, generally, 123,129 
Snedecor, 145, 153, 184 
Soper, 158 

Standard error of estimate, 85 
Standard deviation of estimated 
values, 87 
Statistic, 98, 125 
Statistical independence, defini¬ 
tion of, 64 

Stirling’s approximation, 38, 165 
Stochastic convergence, 115 
Struik, 5 

“ Student’s ” distribution, 128. 
{-distribution, 137 
Tchebycheff’s inequality, 113 
Testing hypotheses about fre¬ 
quency distributions, 168 
linearity of regression, 153 
Testing significance of 

correlation coefficients, 155 
difference between means, 140, 
151 

difference between proportions, 
119 

mean, when universe variance 
is known, 116 

mean, when universe variance 
is unknown, 128 
ratio between variances, 145 
Tetrachoric correlation, 74 
Thurstone, 76 
Total correlation, 80 
Transformation of correlation co¬ 
efficient, 159 

Type A distribution, Gram-Char- 
lier, 59 

Type III distribution, Pearson, 
49 

Unbiased estimates of population 
parameter, 125 
Universe, 
finite, 112 
non-normal, 111 
I Uspensky, 5 



202 


Index 


Variance 

analysis of, 147 
ratio, 145n 

within and between classes, 149 
Weldon’s data, 118 


Wilks, 186 

Yule and Kendall, 95 
^-distribution 

Fisher’s, 142 
“ Student’s,” 130 



Recent Van Nostrand Books on Mathematics , Economics , 
Business and Related Subjects 

Mathematics of Statistics. By john f. kenney, Northwestern University. 

Part One. 

BusinSs T Mathematics. By i. l. miller and c. h. richardson, Professor of Mcdhe- 
matics, Bucknell University. Second Edition. *1 80 

Commer cial Algebra. By i. l. miller and c. h. richardson. *1.80 

Text Book of Algebra. For Colleges and Engineering Schools. By Professors 
w H. cowles and, james e. Thompson, School of Science and Technology, Pratt Institute. $2.25 
Teit Boor^f Trigonometry. For CoUeges and Engineering Schools. By Professor^ 
cowles and Thompson. 

A Manual of the Slide Rule. By s. e. Thompson. ®i- 7& 

Accounting. Elementary Theory and Practice. By richard norman owens, Pro¬ 
fessor of Accounting and Business Administration, George Washington University, and_ rai,ph 
dale KENNEDY, Executive Officer, Department of Business Administration, George Washington^ 

Logarithm^ Tables of Numbers and Trigonometrical Functions. Translated^ 

Mathematics* foTs'elf-Study. By J . e. Thompson, Department of Mathematics, Pratt 
Institute. A group of books that make easy the home study of the working principles of mathe¬ 
matics. The set of five volumes. vs.vo 

Sold Separately. 

Arithmetic for the Practical Man. J2.00 

Algebra for the Practical Man. *2.00 

Geometry for the Practical Man. *2.00 

Trigonometry for the Practical Man. |2.00 

Calculus for the Practical Man. *2.00 

First Year College Mathematics, Part I, Trigonometry. By volney wells, Associate 
Professor of Mathematics, Williams College. £$•=§ 

Part II. Mathematical Analysis. 5 

Combined, Parts I and n. eo.uu 

Recreations in Mathematics. By h. b. licks. |l-50 

Mathematics for the Practical Man. By georob howe. |160 

Rapid Arithmetic. By t. o’conor sloane. 

Speed and Fun with Figures. , *d.50 

Handbook of Engineering Mathematics. By w. e. Wynne and w. spraragen, -Secre- 
tary. Engineering and Industrial Research, National Research Council, New York. _ *2.50 

The Handbook of Applied Mathematics. By martin e jansson. Revised and 
enlarged by Herbert d. harper. With a section on Business Mathematics by peter l. AG^rasw. 

Probability and Its Engineering Uses. By thornton c. fry, Member of the Technical 
Staff, Bell Telephone Laboratories. __ * 7 - 50 

Economics, Principles and Problems. By frank o’hara, Late Banigan Professor of 
Political Economy, The Catholic University of America; Joseph m. o in Econom¬ 

ics, Boston College; edwin b. heweb, Professor of Economic History, School of Commerce, St. 
John’s University. _ 0 , ? j- r> 

Business Organization and Control. By Charles s. tippetts, Dean, School of Busi- 
I ness Administration, University of Pittsburgh, and shaw livermore, Assistant Professor of 

Economics, University of Buffalo. „ T 

The Labor Problem in the United States. By earl e. cummins, Late Professor of 

Economics, Union College. Second Edition. _ $3.75 , 

Business Correspondence: Principles and Practice. By harvey lee marcoux. Pro¬ 
fessor of Business English, Tulane University of Louisiana. $2.50 

A College Guide to Business English. By harvey lee mabcoux. 

English and Science. By philip b. mcdonald, Associate Professor of English, LoUege 
of Engineering, New York University. J 2 ; 90 

Executives’ Business Law. By harry a. toulmin, jr., Member of the Bar of the 
Supreme Court of the United States; Bar of Ohio; of various bars of the United States Courts iff 
Appeal, and United States District Court Bars. , v j 

The Law of Chemical Patents. By edward thomas, Member of the New York and. 
District of Columbia Bars. Seco nd Edition. __ * 9 -°° 

Van Nostrand’s Scientific Encyclopedia. The most Comprehensive and Authori¬ 

tative Treatment of Science and Its Practical Applications. 

Van Nostrand’s Chemical Annual. Rewritten and Enlarged Seventh Issue, 1934. 
By JOHN c. olson, A.M., ph.d., Head of Chemistry Department, Brooklyn Polytechnic Institute. 
Formerly Secretary American Insti tute of Chemical Engineers. _ * 5 00 

Van Nostrand books are sent on approval to residents of the United States and 
Canada. General Catalog sent free upon request. 


prices subject to change without notice. 
















